Prediction and Investigation of the Injury Severity of Drivers Involved in Speeding-Related Crashes Using Machine Learning Models
Downloads
Speeding is the major reason for road traffic crashes and deaths in India. The other driver’s faults include driving under the influence, using mobile phones while driving and driving on the wrong side of the road. Therefore, this study attempts to predict and investigate the driver injury severity (DIS) in speeding-related crashes. A total of 793 police-reported single-vehicle and two-vehicle crash data from Imphal City, India, collected between 2011–2020, were analysed and modelled. For DIS prediction, eleven supervised machine learning (ML) models were implemented using 5-fold and 10-fold cross-validation (FCVs) and trained at train ratio (TR) values of 0.5, 0.6, 0.7 and 0.8 in each FCV. The top ML model for the DIS prediction was selected based on the best combination of recall, accuracy, F1 score, area under the curve (AUC) and precision metrics. Feature importance analysis (FIA) was conducted to determine the impactful factors in DIS prediction. The gradient boosting tree (GBT), stochastic gradient descent, decision tree and lasso-LARS models were identified as the top-performing ML models for the DIS prediction at TR = 0.5, 0.6, 0.7 and 0.8, respectively, in 5-FCV. The light GBM (TR = 0.5 and 0.7), GBT (TR = 0.6) and lasso-LARS (TR = 0.8) were the best-performing ML models in 10-FCV. The FIA results indicated that vehicle type (two-wheeler), nature of crash (head-on collision) and time of crash (12 PM–6 PM and 6 AM–12 PM) variables were the most impactful variables on the DIS prediction in Imphal speeding-related crashes. These ML models can be employed in hilly areas for the accurate prediction of DIS. The study results can help transportation planners in designing road safety measures and strategies to lessen DIS in speeding-related crashes.
Downloads
World Health Organization (WHO). Global status report on road safety 2018: Summary 2018. World Health Organization, 2018.
MoRTH. Road Accidents in India 2021. Ministry of road transport and highway. 2021;1–237. DOI: 10.1016/s0386-1112(14)60239-9.
Ye F, et al. Investigating the severity of expressway crash based on the random parameter logit model accounting for unobserved heterogeneity. Advances in Mechanical Engineering. 2021;13:1–13. DOI:10.1177/16878140211067278.
Guo Y, Wu Y, Lu J, Zhou J. Modeling the unobserved heterogeneity in E-bike collision severity using full bayesian random parameters multinomial logit regression. Sustainability. 2019;11. DOI:10.3390/su11072071.
Mannering FL, Shankar V, Bhat CR. Unobserved heterogeneity and the statistical analysis of highway accident data. Analytic Methods Accident Research. 2016;11:1–16. DOI:10.1016/j.amar.2016.04.001.
Tang J, et al. Crash injury severity analysis using a two-layer Stacking framework. Accident Analysis Prevention. 2019;122:226–38. DOI:10.1016/j.aap.2018.10.016.
Se C, et al. Modeling motorcycle crash-injury severity utilizing explainable data-driven approaches. Transportation Letters. 2024;00:1–26. DOI:10.1080/19427867.2024.2408920.
Cicek E, Akin M, Uysal F, Topcu Aytas RM. Comparison of traffic accident injury severity prediction models with explainable machine learning. Transportation Letters. 2023;15:1043–54. DOI:10.1080/19427867.2023.2214758.
Lundberg SM, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering. 2018;2:749–60. DOI:10.1038/s41551-018-0304-0.
Bokaba T, Doorsamy W, Paul BS. Comparative study of machine learning classifiers for modelling road traffic accidents. Applied Science. 2022;12. DOI:10.3390/app12020828.
Cai X, Rahmati Y, Jain S, Fishelson J. Machine learning methods to analyze and predict crash injury severity based on contributing factors for southeast Michigan. Transportation Research Record. 2023;2677:83–94. DOI:10.1177/03611981221113569.
Jha AN, Chatterjee N, Tiwari G. A performance analysis of prediction techniques for impacting vehicles in hit-and-run road accidents. Accident Analysis Prevention. 2021;157:106164. DOI:10.1016/j.aap.2021.106164.
Birfir S, Elalouf A, Rosenbloom T. Building machine-learning models for reducing the severity of bicyclist road traffic injuries. Transportation Engineering. 2023;12:100179. DOI:10.1016/j.treng.2023.100179.
Wen X, Xie Y, Wu L, Jiang L. Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accident Analysis Prevention. 2021;159. DOI:10.1016/j.aap.2021.106261.
Mokoatle M, Marivate V, Esiefarienrhe M. Predicting road traffic accident severity using accident report data in South Africa. ACM 20th Annual International Conference on Digital Government Research, June 18–20, 2019, Dubai, United Arab Emirates. 2019:11–7. DOI:10.1145/3325112.3325211.
Labib MF, et al. Road accident analysis and prediction of accident severity by using machine learning in Bangladesh. 7th International Conference on Smart Computing & Communications (ICSCC), 28-30 June, 2019, Sarawak, Malaysia. 2019:1–5. DOI:10.1109/ICSCC.2019.8843640.
Zhang S, et al. Hybrid feature selection-based machine learning classification system for the prediction of injury severity in single and multiple-vehicle accidents. PLoS One. 2022;17:1–19. DOI:10.1371/journal.pone.0262941.
Zhou B, et al. Comparing factors affecting injury severity of passenger car and truck drivers. IEEE Access. 2020;8:153849–61. DOI:10.1109/ACCESS.2020.3018183.
Alkheder S, Taamneh M, Taamneh S. Severity prediction of traffic accident using an artificial neural network. Journal of Forecasting. 2017;36:100–8. DOI:10.1002/for.2425.
Abdel-Aty MA, Abdelwahab HT. Predicting injury severity levels in traffic crashes: A modeling comparison. Journal of Transportation Engineering. 2004;130:204–10. DOI:10.1061/(ASCE)0733-947X(2004)130:2(204).
Cai X, Rahmati Y, Jain S, Fishelson J. Machine learning methods to analyze and predict crash injury severity based on contributing factors for southeast Michigan. Transportation Research Record. 2023;2677:83–94. DOI:10.1177/03611981221113569.
Olutayo AA, Eludire AA. Traffic accident analysis using decision trees and neural networks. International Journal of Information Technology Computer Science. 2014;6:22–8. DOI:10.5815/ijitcs.2014.02.03.
Pakgohar A, Tabrizi RS, Khalili M, Esmaeili A. The role of human factor in incidence and severity of road crashes based on the CART and LR regression: A data mining approach. Procedia Computer Science. 2011;3:764–9. DOI:10.1016/j.procs.2010.12.126.
Taamneh M, Alkheder S, Taamneh S. Data-mining techniques for traffic accident modeling and prediction in the United Arab Emirates. Journal of Transportation Safety & Security. 2017;9:146–66. DOI:10.1080/19439962.2016.1152338.
Delen D, Tomak L, Topuz K, Eryarsoy E. Investigating injury severity risk factors in automobile crashes with predictive analytics and sensitivity analysis methods. Journal of Transport & Health. 2017;4:118–31. DOI:10.1016/j.jth.2017.01.009.
Mokhtarimousavi S, Anderson JC, Azizinamini A, Hadi M. Improved support vector machine models for work zone crash injury severity prediction and analysis. Transportation Research Record. 2019;2673:680–92. DOI:10.1177/0361198119845899.
Liao Y, et al. Study on crash injury severity prediction of autonomous vehicles for different emergency decisions based on support vector machine model. Electronics. 2018;7. DOI:10.3390/electronics7120381.
Li Z, Liu P, Wang W, Xu C. Using support vector machine models for crash injury severity analysis. Accident Analysis Prevention. 2012;45:478–86. DOI:10.1016/j.aap.2011.08.016.
Priyanka A, Sathiyakumari. A comparative study of classification algorithm using accident data. International Journal of Computer Science & Engineering Technology. 2014;3:1018–23.
Zhang S, et al. Hybrid feature selection-based machine learning classification system for the prediction of injury severity in single and multiple-vehicle accidents. PLoS One. 2022;17:1–19. DOI:10.1371/journal.pone.0262941.
Mousa SR, Bakhit PR, Ishak S. An extreme gradient boosting method for identifying the factors contributing to crash/near-crash events: A naturalistic driving study. Canadian Journal of Civil Engineering. 2019;46:712–21. DOI:10.1139/cjce-2018-0117.
Jamal A, et al. Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. International Journal of Injury Control and Safety Promotion. 2021;28:408–27. DOI:10.1080/17457300.2021.1928233.
Beshah T, Hill S. Mining road traffic accident data to improve safety: Role of road-related factors on accident severity in Ethiopia. AAAI Conference and Symposium Proceedings. 2010;SS-10-01:14–9.
Umer M, Sadiq S, Sadiq A, Ullah S. Comparison analysis of tree based and ensembled regression algorithms for traffic accident and severity prediction. ArxXv, Abs/201014921. 2020:33–7.
Gan X, Weng J. Predicting crash injury severity for the highways involving traffic hazards and those involving no traffic hazards. CICTP 2020 Advanced Transportation Technologies and Development-Enhancing Connections - Proceedings 20th COTA International Conference of Transportation Professionals. 2020:4195–206. DOI:10.1061/9780784482933.360.
Wahab L, Jiang H. A comparative study on machine learning based algorithms for prediction of motorcycle crash severity. PLoS One. 2019;14:1–17. DOI:10.1371/journal.pone.0214966.
Chen MM, Chen MC. Modeling road accident severity with comparisons of logistic regression, decision tree and random forest. Information. 2020;11. DOI:10.3390/INFO11050270.
Al-Mistarehi B, Alomari AH, Imam R, Mashaqba M. Using machine learning models to forecast severity level of traffic crashes by R Studio and ArcGIS. Frontiers in Built Environment. 2022;8:1–14. DOI:10.3389/fbuil.2022.860805.
Wei T, Zhu T, Lin M, Liu H. Predicting and factor analysis of rider injury severity in two-wheeled motorcycle and vehicle crash accidents based on an interpretable machine learning framework. Traffic Injury Prevention. 2024;25:194–201. DOI:10.1080/15389588.2023.2284111.
Wen X, Xie Y, Wu L, Jiang L. Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with LightGBM and SHAP. Accident Analysis Prevention. 2021;159. DOI:10.1016/j.aap.2021.106261.
Yang C, et al. A machine learning approach to understanding the road and traffic environments of crashes involving driver distraction and inattention (DDI) on rural multilane highways. Journal of Safety Research. 2025;92:14–26. DOI:10.1016/j.jsr.2024.11.011.
Fiorentini N, Losa M. Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures. 2020;5. DOI:10.3390/infrastructures5070061.
Zhou D, Gayah VV., Wood JS. Integration of machine learning and statistical models for crash frequency modeling. Transportation Letters. 2022;00:1–12. DOI:10.1080/19427867.2022.2158257.
Lee C, Li X. Predicting driver injury severity in single-vehicle and two-vehicle crashes with boosted regression trees. Transportation Research Record. 2015;2514:138–48. DOI:10.3141/2514-15.
Chen C, et al. Investigating driver injury severity patterns in rollover crashes using support vector machine models. Accident Analysis Prevention. 2016;90:128–39. DOI:10.1016/j.aap.2016.02.011.
Krishnaveni S, Hemalatha M. A perspective analysis of traffic accident using data mining techniques. International Journal of Computer Applications. 2011;23:40–8. DOI:10.5120/2896-3788.
Zhu M, Li Y, Wang Y. Design and experiment verification of a novel analysis framework for recognition of driver injury patterns: From a multi-class classification perspective. Accident Analysis Prevention. 2018;120:152–64. DOI:10.1016/j.aap.2018.08.011.
Mafi S, AbdelRazig Y, Doczy R. Machine learning methods to analyze injury severity of drivers from different age and gender groups. Transportation Research Record. 2018;2672:171–83. DOI:10.1177/0361198118794292.
Pillajo-Quijia G, Arenas-Ramírez B, González-Fernández C, Aparicio-Izquierdo F. Influential factors on injury severity for drivers of light trucks and vans with machine learning methods. Sustainability. 2020;12. DOI:10.3390/su12041324.
Chong M, Abraham A, Paprzycki M. Traffic accident analysis using machine learning paradigms. Informatica. 2005;29:89–98.
Chang LY, Wang HW. Analysis of traffic injury severity: An application of non-parametric classification tree techniques. Accident Analysis Prevention. 2006;38:1019–27. DOI:10.1016/j.aap.2006.04.009.
Tseng CM, et al. A comprehensive analysis of factors leading to speeding offenses among large-truck drivers. Transportation Research Part F: Traffic Psychology Behaviour. 2016;38:171–81. DOI:10.1016/j.trf.2016.02.007.
Zhou M, Chin HC. Factors affecting the injury severity of out-of-control single-vehicle crashes in Singapore. Accident Analysis Prevention. 2019;124:104–12. DOI:10.1016/j.aap.2019.01.009.
Islam M, Mannering F. The role of gender and temporal instability in driver-injury severities in crashes caused by speeds too fast for conditions. Accident Analysis Prevention. 2021;153. DOI:10.1016/j.aap.2021.106039.
Rashmi BS, Marisamynathan S. Investigating the contributory factors influencing speeding behavior among long-haul truck drivers traveling across India: Insights from binary logit and machine learning techniques. International Journal of Transportation Science & Technology. 2024:1–18. DOI:10.1016/j.ijtst.2024.01.008.
Zhang Z, Xu N, Liu J, Jones S. Exploring spatial heterogeneity in factors associated with injury severity in speeding-related crashes: An integrated machine learning and spatial modeling approach. Accident Analysis Prevention. 2024;206:107697. DOI:10.1016/j.aap.2024.107697.
Rifaat SM, Tay R, De Barros AG. Logistic model of injury risks in single vehicle crashes in urban neighborhoods. Journal of Advanced Transportation. 2011;45:186–95. DOI:10.1002/atr.164.
Haleem K, Alluri P, Gan A. Analyzing pedestrian crash injury severity at signalized and non-signalized locations. Accident Analysis Prevention. 2015;81:14–23. DOI:10.1016/j.aap.2015.04.025.
Kim J, Ulfarsson GF, Kim S, Shankar VN. Driver-injury severity in single-vehicle crashes in California : A mixed logit analysis of heterogeneity due to age and gender. Accident Analysis Prevention. 2013;50:1073–81. DOI:10.1016/j.aap.2012.08.011.
Sorum NG, Pal D. Identification of the best machine learning model for the prediction of driver injury severity. International Journal of Injury Control and Safety Promotion. 2024;0:1–16. DOI:10.1080/17457300.2024.2335478.
Dataiku 2014. Pronouncing Dataiku. https://blog.dataiku.com/2014/08/07/pronouncin g-dataiku. (Accessed 12 April 2023). Dataiku, 2014 2014.
Chang L, Chien J. Analysis of driver injury severity in truck-involved accidents using a non-parametric classification tree model. Safety Science. 2013;51:17–22. DOI:10.1016/j.ssci.2012.06.017.
Copyright (c) 2025 Neero Gumsar SORUM, Martina Gumsar SORUM

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.













