Prediction and Investigation of the Injury Severity of Drivers Involved in Speeding-Related Crashes Using Machine Learning Models

speeding driver injury severity machine learning Dataiku gradient boosting tree lasso-LARS

Authors

Downloads

Speeding is the major reason for road traffic crashes and deaths in India. The other driver’s faults include driving under the influence, using mobile phones while driving and driving on the wrong side of the road. Therefore, this study attempts to predict and investigate the driver injury severity (DIS) in speeding-related crashes. A total of 793 police-reported single-vehicle and two-vehicle crash data from Imphal City, India, collected between 2011–2020, were analysed and modelled. For DIS prediction, eleven supervised machine learning (ML) models were implemented using 5-fold and 10-fold cross-validation (FCVs) and trained at train ratio (TR) values of 0.5, 0.6, 0.7 and 0.8 in each FCV. The top ML model for the DIS prediction was selected based on the best combination of recall, accuracy, F1 score, area under the curve (AUC) and precision metrics. Feature importance analysis (FIA) was conducted to determine the impactful factors in DIS prediction. The gradient boosting tree (GBT), stochastic gradient descent, decision tree and lasso-LARS models were identified as the top-performing ML models for the DIS prediction at TR = 0.5, 0.6, 0.7 and 0.8, respectively, in 5-FCV. The light GBM (TR = 0.5 and 0.7), GBT (TR = 0.6) and lasso-LARS (TR = 0.8) were the best-performing ML models in 10-FCV. The FIA results indicated that vehicle type (two-wheeler), nature of crash (head-on collision) and time of crash (12 PM–6 PM and 6 AM–12 PM) variables were the most impactful variables on the DIS prediction in Imphal speeding-related crashes. These ML models can be employed in hilly areas for the accurate prediction of DIS. The study results can help transportation planners in designing road safety measures and strategies to lessen DIS in speeding-related crashes.