On the Performance of Machine Learning Based Flight Delay Prediction – Investigating the Impact of Short-Term Features
Keywords:flight delay prediction, machine learning, aviation, feature importance, classification, SHAP
People and companies today are connected around the world, which has led to a growing importance of the aviation industry. As flight delays are a big challenge in aviation, machine learning algorithms can be used to forecast those. This paper investigates the prediction of the occurrence of flight arrival delays with three prominent machine learning algorithms for a data set of domestic flights in the USA. The task is regarded as a classification problem. The focus lies on the investigation of the influence of short-term features on the quality of the results. Therefore, three scenarios are created that are characterised by different input feature sets. When forgoing the inclusion of short-term information in order to shift the prediction timing to an early point in time, an accuracy of 69.5% with a recall of 68.2% is achieved. By including information on the delay that the aircraft had on its previous flight, the prediction quality increases slightly. Hence, this is a compromise between the early prediction timing of the first model and the good prediction quality of the third model, where the departure delay of the aircraft is added as an input feature. In this case, an accuracy of 89.9% with a recall of 83.4% is obtained. The desired timing of prediction therefore determines which features to use as inputs since short-term features significantly improve the prediction quality.
Awad M, Khanna R. Efficient learning machines theories, concepts, and applications for engineers and system designers. Berkeley, CA: Apress; 2015.
Bureau of Transportation Statistics (BTS). 2019 traffic data for U.S. airlines and foreign airlines U.S. flights. 2020. https://www.bts.dot.gov/newsroom/final-full-year-2019-traffic-data-us-airlines-and-foreign-airlines-us-flights [Accessed 21st Mar. 2022].
Bureau of Transportation Statistics (BTS). Airline on-time performance and causes of flight delays. 2021. https://www.bts.gov/topics/airlines-and-airports/airline-time-performance-and-causes-flight-delays [Accessed 21st Mar. 2022].
Federal Aviation Administration (FAA). Air traffic by the numbers. 2020. https://www.faa.gov/air_traffic/by_the_numbers/media/Air_Traffic_by_the_Numbers_2020.pdf [Accessed 21st Mar. 2022].
Jacquillat A, Odoni AR. A roadmap toward airport demand and capacity management. Transportation Research Part A: Policy and Practice. 2018;114: 168-185. doi: 10.1016/j.tra.2017.09.027.
Belcastro L, Marozzo F, Talia D, Trunfio P. Using scalable data mining for predicting flight delays. ACM Transactions on Intelligent Systems and Technology. 2016;8(1): 1-20. doi: 10.1145/2888402.
Ding Y. Predicting flight delay based on multiple linear regression. In: Jia XL, Zhou SQ, Patty AA (eds.) IOP Conference Series: Earth and Environmental Science, Volume 81, 2nd International Conference on Materials Science, Energy Technology and Environmental Engineering (MSETEE 2017), 28–30 Apr. 2017, Zhuhai, China. IOP Publishing; 2017.
Yazdi MF, Kamel SR, Chabok SJM, Kheirabadi M. Flight delay prediction based on deep learning and Levenberg-Marquart algorithm. Journal of Big Data. 2020;7(106): 1-28. doi: 10.1186/s40537-020-00380-z.
Huo J, et al. The prediction of flight delay: Big data-driven machine learning approach. 2020 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 14–17 Dec. 2020. IEEE; 2020. p. 190-194.
Gui G, et al. Flight delay prediction based on aviation big data and machine learning. IEEE Transactions on Vehicular Technology. 2020;69(1): 140-150. doi: 10.1109/tvt.2019.2954094.
Kalyani NL, et al. Machine learning model - based prediction of flight delay. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 7-9 Oct. 2020. IEEE; 2020. p. 577-581.
Manna S, et al. A statistical approach to predict flight delay using gradient boosted decision tree. 2017 International Conference on Computational Intelligence in Data Science (ICCIDS), 2-3 June 2017, Tamilnadu, India. IEEE; 2017. p. 1-5.
US Department of Transportation (US DOT). 2015 flight delays and cancellations. 2017. https://www.kaggle.com/usdot/flight-delays [Accessed 21st Mar. 2022].
Marsland S. Machine learning - An algorithmic perspective. New York: CRC Press; 2015.
Burnett RA, Si D. Prediction of injuries and fatalities in aviation accidents through machine learning. ICCDA '17: Proceedings of the International Conference on Compute and Data Analysis, 19-23 May 2017, Lakeland, USA. New York: ACM Press; 2017. p. 60-68.
Horiguchi Y, et al. Predicting fuel consumption and flight delays for low-cost airlines. AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 4-9 Feb. 2017, San Francisco, USA. AAAI Press; 2017. p. 4686–4693.
Jan SS, Chen YT. Development of a new airport unusual-weather detection system with aircraft surveillance information. IEEE Sensors Journal. 2019;19(20): 9543-9551. doi: 10.1109/jsen.2019.2926391.
Yablonsky G, et al. Flight delay performance at Hartsfield-Jackson Atlanta International Airport. Journal of Airline and Airport Management. 2014;4(1): 78-95. doi: 10.3926/jairm.22.
Xu N, Sherry L, Laskey KB. Multifactor model for predicting delays at U.S. Airports. Transportation Research Record: Journal of the Transportation Research Board. 2008;2052(1): 1-15. doi: 10.3141/2052-08.
National Oceanic and Atmospheric Administration (NOAA). data/ global-hourly/ archive/ csv. 2019. https://www.ncei.noaa.gov/data/global-hourly/archive/csv/ [Accessed 21st Mar. 2022].
NOAA SciJinks. How reliable are weather forecasts? https://scijinks.gov/forecast-reliability/ [Accessed 21st Mar. 2022].
Federal Aviation Administration (FAA). Core 30. https://aspm.faa.gov/aspmhelp/index/Core_30.html [Accessed 21st Mar. 2022].
Alpaydin E. Introduction to machine learning. Cambridge: MIT Press; 2020.
Russell SJ, Norvig P. Artificial intelligence - A modern approach. London: Prentice Hall; 2010.
Pedregosa F, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research. 2011;12: 2825-2830. https://www.jmlr.org/papers/volume12/
pedregosa11a/pedregosa11a.pdf [Accessed 21st Mar. 2022].
Chen T, Guestrin C. XGBoost: A scalable tree boosting system. KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data, Mining 13-17 Aug. 2016, San Francisco, USA. New York: ACM Press; 2016. p. 785-794.
Chollet F. Keras. https://keras.io [Accessed 21st Mar. 2022].
Abadi M, et al. TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/ [Accessed 21st Mar. 2022].
Kubat M. An introduction to machine learning. Cham: Springer Nature; 2021.
Dembczynski K, et al. Optimizing the F-measure in multi-label classification: Plug-in rule approach versus structured loss minimization. PMLR Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA. 2013. p. 1130-1138.
Esmaeilzadeh E, Mokhtarimousavi S. Machine learning approach for flight departure delay prediction and analysis. Transportation Research Record: Journal of the Transportation Research Board. 2020;2674(8): 145-159. doi: 10.1177/0361198120930014.
Claesen M, et al. Hyperparameter tuning in Python using Optunity. International Workshop on Technical Computing for Machine Learning and Mathematical Engineering (TCMM 2014), Leuven, Belgium. 2014. p. 1-2.
Freitas D, Guerreiro Lopes L, Morgado-Dias F. Particle swarm optimisation: A historical review up to the current developments. Entropy. 2020;22(3): 1-36. doi: 10.3390/e22030362.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 4-9 Dec. 2017, Long Beach, USA. Red Hook: Curran Associates Inc.; 2017. p. 4765-4774.
Gianfagna L, Di Cecco A. Explainable AI with Python. Cham: Springer International Publishing; 2021.
How to Cite
Copyright (c) 2022 Delia Schösser, Jörn Schönberger
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.