PREDICTING BUS TRAVEL TIME WITH HYBRID INCOMPLETE DATA – A DEEP LEARNING APPROACH

The application of predicting bus travel time with real-time information, including Global Positioning System (GPS) and Electronic Smart Card (ESC) data is effective to advance the level of service by reducing wait time and improving schedule adherence. However, missing information in the data stream is inevitable for various reasons, which may seriously affect prediction accuracy. To address this problem, this research proposes a Long Short-Term Memory (LSTM) model to predict bus travel time, considering incomplete data. To improve the model performance in terms of accuracy and efficiency, a Genetic Algorithm (GA) is developed and applied to optimise hyperparameters of the LSTM model. The model performance is assessed by simulation and real-world data. The results suggest that the proposed approach with hybrid data outperforms the approaches with ESC and GPS data individually. With GA, the proposed model outperforms the traditional one in terms of lower Root Mean Square Error (RMSE). The prediction accuracy with various combinations of ESC and GPS data is assessed. The results can serve as a guideline for transit agencies to deploy GPS devices in a bus fleet considering the market penetration of ESC.


INTRODUCTION
The growing demand for freight and passenger transport needs to be met in order to prevent negative externalities. To maximise the total profit with the trend of growing freight demands, lots of studies proposed ships or truck scheduling models [1,2]. For passenger transit agencies, adjusting the structure of the transportation system based on growing passenger flows is of great importance [3,4]. Moreover, since passenger demand drastically increases, it will deteriorate the service quality and efficiency of bus transportation systems, especially during peak hours [5]. Therefore, developing a model to enhance the service quality is desirable to facilitate bus operation.
Maintaining schedule adherence and promoting the level of service are challenging for bus transit agencies. Poor schedule adherence increases passenger waiting time and reduces the attractiveness of bus service. Developing a sound approach to predict bus travel time in real-time is desirable to facilitate bus operation. However, bus travel time is stochastic and sometimes difficult to predict because of many factors, such as passenger boarding/ alighting demand, traffic conditions and delays at intersections [6,7], especially in peak periods. Providing reliable and accurate bus travel time would be an effective way to improve the level of service [8][9][10].
Bus arrival/departure times at stops reported by GPS data can be used to develop bus travel time prediction models [11]. The automated data-collection system in Xi'an Traffic Information Centre consists of ESC and GPS data, which is used to monitor passenger flow and the location of buses in the urban bus transit system in real-time [12,13]. However,

LITERATURE REVIEW
Bus travel time is stochastic because of traffic conditions between stops, dwell time at stops and delays at intersections, which fluctuate spatially and temporally. Providing reliable and accurate bus travel time is one of the effective ways to enhance the level of service [8]. However, developing a sound model considering incomplete data is a challenging task. Dai and Mu [17] predicted bus travel time with various degrees of missing ESC data. It was found that prediction accuracy decreased as the amount of data missing increased. Previous studies demonstrated that using real-time information (e.g. GPS data and/or ESC data) can improve prediction accuracy. However, missing information in GPS data and ESC data streams is common for various reasons, such as system stability affected by data communication under different geographical and environmental restrictions. Only a few studies predicted bus travel time considering the impact of missing data [16].
Deep learning models can map complex relations between the input factors and bus travel time without an explicit function form [21], which can be classified into Support Vector Machine (SVM) models, Kalman Filtering (KF) models, and Artificial Neural Network (ANN) models.
Zhong et al. [22] developed an SVM model to predict bus travel time. Yang et al. [23] integrated SVM and GA to predict bus travel time. Later, Peng et al. [24] enhanced the GA-SVM model by a principal component analysis algorithm. The results suggested that the enhanced SVM model outperformed the traditional SVM model. Bus travel time on a particular path has time sequence characteristics (i.e. consecutive buses operate in a similar traffic condition) [25]. However, the SVM model has limitations to forecast time sequence information, such as bus travel time.
KF is capable of updating the state variable with new observations, which has been widely applied to predict time sequence information [26]. Chien and Kuchipudi [27] developed a KF model to predict bus travel time using GPS and Automatic Passenger Counters (APC) data. Jairam et al. [28] applied a KF model to predict bus travel time. The results suggested that the KF model outperformed the SVM and the historical mean prediction model. Due to the inherent limitations of the Markov property, the KF model deteriorated as the number of time steps increased [29].
ANN models have been widely applied to predict bus travel time. Jeong and Rilett [30] used ANN to predict bus arrival time using Automatic Vehicle the GPS information (e.g. latitude/ longitude coordinates etc.) is obtained in a fixed time interval (e.g. 15 seconds), which may affect the prediction accuracy and stability. Therefore, the ESC data can be applied to fill the gap and improve prediction accuracy. Zhou et al. [14] predicted bus travel time using hybrid data. The results suggested that the model using hybrid data outperformed those using GPS data or ESC data, individually. Previous studies assumed that all buses equipped GPS devices and all passengers used smart cards. However, in a practical urban network, missing information in data stream is unavoidable for various reasons. For example, many agencies do not equip GPS devices for an entire bus fleet, and in some particular situations the smart card records are unavailable (i.e. passengers who paid cash for ticket fares, stops without boarding passengers etc.) [15,16]. These situations may jeopardise the performance of prediction accuracy and stability [17].
Bus travel time is dynamic and stochastic, affected by various factors such as passenger boarding/ alighting time, traffic conditions and delays at intersections [18]. Recurrent Neural Network (RNN) considers a sequence of data inputs and has been widely used in time sequence analysis. The LSTM is an advanced form of RNN, which is robust for predicting bus travel time [19,20]. However, LSTM is characterised by a set of hyperparameters, which shall be effectively determined by a sound algorithm to yield the best performance.
The focus of this study is to develop an LSTM model for predicting bus travel time, considering incomplete data. A GA is developed and applied to calibrate the hyperparameters of the LSTM model. Our findings show that optimised parameters via GA can effectively enhance the model performance in terms of prediction accuracy. Moreover, the available proportion of GPS data and ESC data could affect the prediction accuracy. Transit agencies shall effectively select the type and amount of data to predict bus travel time, according to the penetration of smart card users as well as the fraction of buses equipped GPS devices. The following section discusses the review of previous studies on bus travel time prediction. Then, section 3 discusses the proposed approach to predict bus travel time using incomplete data. Section 4 focuses on discussing a case study in Xi'an, China. Section 5 reveals the results of the case study. Finally, section 6 summarises research findings and discusses future research.
buses (e.g. accidents and traffic jams). Thus, training from two directions is unnecessary. GCN is used to extract the features of topology structure in time sequence information, which is used to predict bus travel time, speed and passenger flow. However, it is computationally expensive [38].
LSTM is characterised by a set of hyperparameters, which shall be effectively determined to yield the best prediction performance using GA and Evolutionary Algorithm (EA). Zhao and Zhang [39] proposed a hybrid model including a learning-based algorithm and EA to solve multi-objective optimisation problems. With a GA, Dulebenets [40] proposed an improved GA to optimise the truck schedule. Liu et al. [41] proposed an EA with an angle-based selection strategy and a shift-based density estimation strategy to optimise multi-objective problems. Pasha et al. [42] proposed an EA to optimise a supply chain problem and found that EA outperforms the other metaheuristic algorithms (i.e. Variable Neighbourhood Search, Tabu Search, and Simulated Annealing). D'Angelo et al. [43] proposed a hybrid deep learning model using GA and a decision-tree model to distinguish between meningitis etiologies using standard and clinical datasets.

METHODOLOGY
The objective of this study is to develop a model to predict bus travel time from stop i to all downstream stops j considering missing data. To discuss the model development, the trajectories of buses on a general route are illustrated in Figure 1, where k Location (AVL) systems. The results suggested that ANN outperformed SVM. RNN is an advanced form of ANN, which is capable of forecasting bus travel time [31]. Many deep learning methods, such as LSTM, Gated Recurrent Unit (GRU), Bi-directional Long Short Term Memory (Bi-LSTM), Bi-Gated Recurrent Unit (Bi-GRU) and Graph Convolution Network (GCN) are similar to the RNN structure, which has been widely applied to predict time sequence information. LSTM is an advanced RNN structure, which is robust for predicting bus travel time [19,20]. Liu et al. [32] developed an LSTM model to predict bus travel time using GPS data and found that LSTM outperformed RNN. GRU is proposed as a simpler alternative to LSTM. GRU is a simpler alternative to LSTM but characterised by fewer hyperparameters, the performance of which can be improved by enhancing its training process [33]. Zhai et al. [34] developed a GRU model to predict vehicle speed and found that GRU outperformed a convolutional neural network (CNN) model. However, GRU simplifies the LSTM by reducing long-term time series feedback, which has limitations to forecasting stochastic information [35]. Bi-LSTM and Bi-GRU are the advanced LSTM and GRU structures, respectively, which train data from two directions. Xue et al. [36] developed a Bi-LSTM model to predict expressway traffic flow, which exhibited higher prediction accuracy during off-peak hours. Shu et al. [37] developed a Bi-GRU model to predict short-term traffic flow and found that Bi-GRU outperformed GRU. However, the training of Bi-LSTM and Bi-GRU is computationally expensive [37]. The travel time of a bus is highly correlated to the travel times of its leading

Time Predicted bus trajectory
Actual bus trajectory where a Eik and a Gik Stops without passengers who use smart card, at the same time, cannot receive accurate GPS signals, and the bus travel time from stop i to downstream stops refers to the travel time of the immediate leading bus.
The proposed LSTM model is developed with hybrid data, and a GA is applied to optimise its hyperparameters. In Figure 2, the model consists of a set of cells (C ijk ) and is used to predict travel time (T ijk ) of bus k between stops i and j.
Cell C ijk consists of an input layer with travel times from stop i to j of some previous buses. The number of inputs (i.e. previous buses) depends on a hyperparameter of the LSTM model (i.e. lag sizes). Lag size (L) refers to the number of input parameters (consecutive travel times) given as input of the represents the index of buses, i and j are indices of stops. Note that bus k-1 is immediately dispatched before bus k.
The travel time of bus k from stop i to j denoted as T ijk can be determined by Equation 1 [13].
where a jk is the arrival time of bus k at stop j, and d ik is the departure time of bus k at stop i. T ijk can be predicted by historic data, such as travel time of bus k-1 and bus k-2 denoted as Considering the availability of the ESC and GPS data, four scenarios are considered to obtain actual arrival/departure time information. Stops with passengers who use smart card, at the same time, receive accurate GPS signals, the actual arrival time a ik is determined by the earlier arrival time, and the actual departure time d ik is the later departure time reported from ESC and GPS data. Thus,

Figure 2 -Configuration of the LSTM model
The grid search algorithm was commonly applied to train hyperparameters B and L. However, the process is complicated and computationally expensive. In this study, we developed a GA to optimise B and L, which minimizes RMSE expressed by Equation 12 [45].
where k is the index of buses; N is the number of samples.
To ensure that the hyperparameters always minimise RMSE, the fitness function of GA is the inverse of RMSE shown as Equation 13.
The GA starts by setting parameters and initialising a set of feasible hyperparameters (i.e. B and L). Then, the LSTM model employs these hyperparameters to predict travel time, followed by computing the RMSE of predicted travel time against actual travel time. Then, new solutions are produced through selection, crossover, mutation and fitness evaluation, until the terminating condition (i.e. max number of generations) is satisfied. The detailed description is given below and illustrated in Figure 3.
Step 1: Specify the parameters of GA, including population size (i.e. 20), crossover rate (i.e. 0.9), mutation rate (i.e. 0.1), and the maximum number of iterations (i.e. t max =30), then generate the initial set of random feasible solutions and initialise the generation counter (i.e. t=0).
Step 2: Run the LSTM model with the hyperparameters suggested by the initial set of solutions and calculate the fitness function of each solution with Equation 13. After that, input the initial set of solutions and fitness function value of each solution to Step 3.
Step 3: The tournament selection is applied (i.e. randomly select two solutions from the initial set of solutions and then choose the one yielding the best fitness value). This process is repeated until the number of solutions is equal to the population size. Then, a new set of solutions is produced via uniform crossover (i.e. randomly selects portion genes of two parents, then exchanges these genes to produce new children) and random mutation operations (i.e. randomly selects two genes from a parent, then exchanges these genes to produce a new child). After that, the fitness function of each solution is calculated with Equation 13.

model, the input parameters of LSTM
One hidden layer consists of an intermediate layer and a final layer. One output layer includes the predicted travel time of bus k between stops i and j (T ijk ).
In input layer, if a preceding bus has arrived at stop j, Tijk t can be determined by Equation 1. Note that the number of preceding buses is determined by the lag sizes of the LSTM model.
The node of an intermediate layer in cell C ijk denoted as c ̅ ijk is determined by Equation 6 [16].
where b cij and w cij represent a bias and a weight matrix associated with the cells of the intermediate layer, respectively, which are adjusted in the LSTM model. σ represents a sigmoid function. The output of a hidden layer in cell C ijk denoted as c ijk , is determined by c ̅ ijk and c ijk-1 with Equations 7-9 [16]. , , where F ijk and I ijk are outputs of the forget gate and the input gate of cell C ijk ; b Fij and w Fij represent the bias and weight matrices of the forget gate; b Iij and w Iij represent the bias and weight matrices of the input gate, respectively. The output of cell C ijk is T ijk , which can be determined by c ijk with Equations 10 and 11 [16].
where O ijk is the output of the output gate; tanh is a hyperbolic tangent function; b Oij and w Oij represent the bias and weight matrices of the output gate, respectively.
The proposed LSTM model is developed by using a module of the MATLAB Toolbox named lst-mLayer. Batch sizes (B) and lag sizes (L) are two hyperparameters to be optimised for yielding the best performance in terms of accuracy [44].
Batch size (B) refers to how many cells are used in the LSTM model, which dictates the bias and weight matrices (e.g. b cij and w cij ). Lag size (L) refers to the number of previous bus travel time given as input of the model.
Step 4: Check if the maximum number of iterations is attained. If not, replace the initial set of solutions with a new set of solutions, update the generation counter and go to Step 2; otherwise, terminate the algorithm and report the best hyperparameters (i.e. the solution with the best fitness value in the set of solutions).

CASE STUDY
In the case study, we employ route 35 to test the proposed approach performance. Route 35 serves passengers in a central business district (CBD) in Xi'an, China, as shown in Figure 4. The study route is 11 km long and serves 21 stops with 3-minute headway during 7:00~9:00 and 18:00~20:00, and 7-minute headway in other periods. The average stop spacing is 0.55 km, and the fleet size is 23 buses. Each bus has 38 seats with two doors for boarding and alighting passengers individually. The GPS data and ESC data associated with the study route were obtained during weekdays in May 2019.
The ESC data consists of detection date, detection time, boarding stop ID and card ID as shown in Table 1 Calculate fitness function with Equation 13 Is the termination condition met? (t≥t max )

Figure 4 -Configuration of route 35
It is challenging to accurately predict bus travel time in peak hours because of the variation of traffic congestion and passenger demand [46]. Thus, the proposed approach performance is assessed using peak-hour data in the morning (i.e. 7:00~9:00). The statistics of the data are illustrated in Table 3, including the stop spacing, the average and standard deviation of link travel time and dwell time.
MS-Access database in Xi'an traffic information centre in real-time. There are 5,789 records for the weekdays in May 2019.
The GPS data consists of detection date, detection time, latitude, longitude and speed, which are reported on a 15-second interval basis as shown in Table 2. There are 355,467 records stored in the MS-Access database. respectively. Although its RMSE shows a trend of increase along the bus travel time, it seems quite stable with only a few hikes. This seems a significant benefit from adapting a deep learning approach, such as the LSTM model, to ensure higher prediction accuracy, because it can fine-tune the prediction result based on real-time data. With more accurate bus arrival information, passengers can therefore schedule their departure time to reduce waiting time at stops effectively.

RESULTS AND DISCUSSION
The assessment was conducted by using different combinations of data sets including ESC data only, GPS data only and hybrid data (ESC and GPS data). We evaluated the prediction accuracy based on RMSE as shown in Figure 5, which illustrates the RMSE of bus travel time from stop 1 to all downstream stops (i.e. stops 2 through 21). Figure 5 indicates that the RMSE increases with travel time. The RMSE with ESC data only is generally high and fluctuates significantly in stops with a high standard deviation of travel time, which can be attributed to the market penetration of smart cards (80% of passengers in the study route used smart cards). One can also observe that the RMSE of approaches using ESC and GPS data show a trend of To analyse how GA can improve the accuracy of the prediction approach, the performance is assessed by LSTM with GA and grid search using hybrid data set. In the approach without GA, the hyperparameters of each LSTM model are trained by a grid search algorithm. Figure 7 compares the performance of the proposed approach with GA and LSTM with grid search from origin stop (stop i) to stop i+1.
As shown in Figure 7, the proposed approach outperforms the LSTM with grid search in prediction accuracy, and the average RMSE is reduced by 6.62%. This is because GA could stably find the optimal solution (i.e. hyperparameters for the proposed LSTM model). As a result, GA is important The assessment is conducted by using different deep learning methods including LSTM, GRU and Bi-LSTM. We evaluate the prediction accuracy as shown in Figure 6. Figure 6 illustrates the RMSE of bus travel time prediction model using different models from origin stop (stop i) to stop i+1 and standard deviation of travel time in this path.
As shown in Figure 6, the proposed approach outperforms GRU and Bi-LSTM in prediction accuracy; the average RMSE is reduced by 0.86s and 0.95s, respectively. This is because the GRU is unstable in some paths with higher standard deviation of travel time (i.e. origin from stops 4 and 6). Since travel time is almost solely affected by leading buses in the research route, LSTM outperforms the model with Bi-LSTM in prediction accuracy. The proposed approach with GA LSTM with grid search Figure 7 -RMSE of the approach with GA and LSTM with grid search cash for tickets), the proposed approach is developed using the hybrid data with various combinations of GPS and ESC data in Figure 8. Figure 8a shows how RMSE changes when prediction is performed using hybrid data with different numbers of buses equipped GPS devices in the study fleet. As shown in Figure 8a, RMSE decreases significantly at the beginning (from 1 through 14 buses) as the number of buses equipped GPS devices increases, and then RMSE slightly decreases (from 14 through 23 buses). The agency may consider equipping 14 through 23 buses (i.e. more than 61% buses in the study fleet) subject to the budget constraint. Figure 8b shows how RMSE changes when the prediction is performed using hybrid data with different smart card market penetrations (i.e. the rate of available smart card data). In the case of 80% of ESC market penetration, the available ESC data ranges from 0% to 80%. It was found that RMSE to train the hyperparameters for the LSTM model, which can improve the stability and accuracy of the proposed approach.
To test the overfitting of the model, we used the data obtained on odd days of route 35 during the weekdays in May 2019 as the train data set. We optimised the hyperparameters of the LSTM model using the train data set. Then, we used the same hyperparameters to predict bus travel time on even days of route 35 during the weekdays in May 2019 (i.e. test data set). The result shows that the average RMSE of the train data set and the test data set are 44.73s and 45.21s, respectively. The gap between the RMSE of the train data set and the test data set is small (i.e. 0.48s). Therefore, parameters obtained by GA are stable to predict bus travel time.
To analyse the impact of the available GPS data and ESC data (i.e. buses which equipped GPS devices and the proportions of passengers who paid   decreases as the available ESC data increases. The available ESC data shall be higher than 50% and the significant accuracy of prediction can be expected.

CONCLUSION
In this study, real-world ESC and GPS data are used to develop the proposed LSTM model for predicting bus travel time with incomplete data. The results suggest that using hybrid data would outperform the approach with ESC data only and GPS data only; the RMSE is reduced by 44.11% and 25.47%, respectively. The results also suggest that LSTM outperforms GRU and Bi-LSTM; the RMSE is reduced by 0.86s and 0.95s, respectively.
To improve the accuracy of the approach, a GA was developed, which demonstrated itself effectively optimising the hyperparameters of the LSTM model. Its performance outperforms the traditional LSTM model that uses the hyperparameter determined by performing a grid search, in terms of lower RMSE (reduced by 6.62%).
The results also suggest that more than 61% and 50% of available GPS and ESC data, respectively, can be used to improve the performance of prediction accuracy. Figures 8a and 8b can serve as a guideline to equip GPS devices in a bus fleet subject to available ESC data and expected travel time prediction accuracy.
As immediate extensions of this study, the developed LSTM model can be enhanced by considering more realistic conditions. The avenues of future research may include the following: (i) considering link travel time uncertainties that may occur due to delay at intersections, weather characteristics and other factors; (ii) considering other routes which pass the same pair of OD (e.g. stops) to enrich the data set; (iii) improving the model performance based on other deep learning models (e.g. Bi-LSTM, Bi-GRU and GCN); and (iv) improving the performance of the prediction model by enhancing ESC data sets after the real-time data transmission technique has been improved.

ACKNOWLEDGEMENT
This paper was supported by the National Natural Science Foundation of Shaanxi, China, under grants 2020JQ-399 and 2021JZ-20. The authors also would like to thank the Xi'an public transport company and Xi'an traffic information centre for providing the data for this research.