Estimating Factors Influencing the Capacity of the Wide-Road-and-Narrow-Bridge Section Based on Random Forest

In order to promote the modernisation process of rural roads and improve road capacity, the problem of bottleneck sections of rural roads needs to be solved urgently. The phenomenon of wide-road-and-narrow-bridge sections is particularly prominent in rural roads. Based on this, this paper analyses the degree of influence of roadway one-way lane width, bridge deck one-way lane width, motorised vehicles to non-motorised vehicles ratio, and road-bridge connection dimension on the capacity of the wide-road-and-narrow-bridge section based on the combination of VISSIM simulation and random forest algorithm. The result of the coefficient of determination ( R 2 ) of the random forest-based capacity prediction model shows that the random forest fits the data very well; the degree of influence on the capacity is in descending order of the bridge deck one-way lane width, motorised vehicles to non-motorised vehicles ratio, roadway one-way lane width and the road-bridge connection dimension. The model can, on the one hand, provide a reference for improving the capacity of bottleneck sections of rural roads; on the other hand, it can provide decision value for the order of measures to be taken when rural roads are rebuilt and expanded, according to the order of importance.


INTRODUCTION
In recent years, with the accelerated pace of rural revitalisation, rural transportation has been greatly improved, promoting the development of the rural economy, and the state is paying more and more attention to the transportation situation in rural areas. The State Council's 14th Five-Year Plan for Promoting Agricultural and Rural Modernisation of China, among other documents, has given priority to the construction of rural roads [1]. However, there are still inherent problems with rural roads, such as generally low design levels, poor implementation of technical standards, insufficient funds for construction and maintenance and complex road conditions, which still limit the development of rural transport. With the rapid growth of China's national economy, the vehicle ownership rate in rural areas and the vehicle models have increased greatly. Therefore, the bottlenecks in low-grade roads, such as narrow bridges, have become more and more prominent in affecting regional traffic. When rural roads are widened and reconstructed, bridges are usually not constructed and renovated accordingly due to financial constraints and backward planning and design,

Estimating Factors Influencing the Capacity of the Wide-Road-and-Narrow-Bridge Section Based on Random Forest
resulting in wide-road-and-narrow-bridge sections. In regulations such as "Road Traffic Signs and Markings", a narrow bridge is defined as a bridge with a width of less than 6 meters on the road surface [2]. In order to ensure the smooth flow of rural roads and improve the travel conditions of the masses, it is necessary to focus on optimising the structure of the rural road network. The wide-road-and-narrow-bridge section is the bottleneck affecting the smooth passage of rural roads, which is a problem that cannot be ignored in the upgrading of rural roads.

LITERATURE REVIEW
Currently, there are many studies on the capacity of bottleneck road sections in China and abroad. Okamura et al. [3] collected the traffic flow data of bottleneck sections of a Japanese suburban expressway and analysed some possible factors causing congestion and the relationship between these factors and road capacity. Wu et al. [4] introduced the asymmetric driving theory and improved the Newell following model to reveal more accurately the traffic capacity variation pattern of a bottleneck section of a highway. Raju et al. [5] used a simulation method in their study to quantify the impact of bottlenecks existing on the middle section of the road, and they found that the road section near the upstream side was greatly affected by the reduced capacity, while the downstream capacity did not exceed the capacity of the bottleneck section. However, these were all based on conventional two-lane and above highways or urban roads with lane widths greater than 3 m, whereas few studies have been conducted on wide-road-and-narrow-bridge sections of rural roads [6]. With the gradual increase of vehicle ownership in rural areas, it will be even more impossible to guarantee efficient and smooth traffic safety, which is a bottleneck problem for rural revitalisation. Therefore, it is of great practical significance to study and explore the capacity of narrow bridges on rural low-grade highways.
Most of the traditional methods [7][8][9] for analysing the traffic capacity of bottleneck sections require the investigation of a large amount of traffic data, including traffic volume, speed, etc. At the same time, the workload and difficulties are greater in the field investigation of two-lane roads with high interference from opposing traffic. The VISSIM simulation software [10] can quickly build a complex road traffic model based on the approximate data from field research to compensate for the distortion or missing data caused by accidents or lack of manpower during actual research. In addition, the Random Forest (RF) algorithms [11] can handle both discrete and continuous data, and the training speed is fast, which is ideal for studying traffic data. This paper, therefore, proposes a method based on a combination of the RF algorithm and VISSIM simulation to analyse the factors influencing the capacity of wide-road-and-narrow-bridge sections and their importance.

SELECTION OF INFLUENCING FACTORS
This study is based on a narrow two-lane wide-road-and-narrow-bridge section in Ningdu County, Ganzhou City, Jiangxi Province, which is a rural tertiary road, namely County Road X428. The current traffic situation of a county road X428 was analysed based on actual research data, and the main influencing factors of the capacity of wide-road-and-narrow-bridge were selected to provide data support for the subsequent simulation validation.

Overview of the study section
County Road X428 is the main route connecting the countryside to the county town. There is no central divider separating the east and west lanes of the road, and the road can accommodate normal traffic flow in both directions. Due to the mountainous side of the road and its proximity to the Qin River, vehicular traffic is not affected by the entrances and exits on either side of the road. Moreover, because of the width of the road, overtaking is not possible. The whole section of the study object's one-way lane width is less than 3 m. The widest part of the cross-section is 5.2 m on the road, and the one-way lane width is 2.6 m. While the bridge is the narrowest part of the road cross-section, the cross-section width is 4.4 m with a one-way lane width of 2.2 m. In addition, there is free space on both sides of the road and there are pavements on both sides of the bridge for people to walk, so vehicles are unaffected by pedestrians. At the same time, County Road X428 is a grade three highway with a design speed of 40 km/h and a speed limit of 60 km/h. The bridge is limited to a height of 2.2 meters, so large vehicles are not allowed to pass. However, there are more non-motorised vehicles in rural areas and the road is a mixed motorised-non-motorised section with an enormous demand for non-motorised vehicles. The factual reconstruction diagram is shown in Figure 1a and the satellite map is shown in Figure 1b.

Data collection and analysis
This study focuses on the simultaneous investigation of both road and bridge sections using manual survey methods to observe and record data on traffic volumes, location speeds and time headway at observation points.
Prefectural Road X428 traffic composition is mainly small vehicles, due to the height limit of 2.2 m, basically without large vehicle traffic. On the other hand, because it is a rural road, the road is also a non-mixed lane. Furthermore, the amount of non-motorised vehicles is relatively large, even larger than the number of motorised vehicles. Unlike urban roads, the road does not have a fixed morning and evening peak period. However, around the Chinese New Year, the peak hours on this road are similar to those of urban roads. At the same time, the time headway and speed as the vital evaluation index of the two-lane highway play an important role in the study of the level of service and capacity of the road. According to the data after two hours of video shooting and the statistics measured with the location speed meter, the data of each observation section are shown in Table 1. As shown in Table 1, the location speed and the average time headway of vehicles on the road surface are greater than that of the bridge section, which indicates that when vehicles pass through the road-bridge connecting section, drivers will produce deceleration behaviour due to the narrowing of the width of the travel lane in front of them. The ratio of motorised-to-non-motorised vehicles accounted for close to 1:1, and sometimes the proportion of non-motorised vehicles is even higher than that of motorised vehicles.

Analysis of selected influencing factors
Due to the lack of funds for rural roads, it is impossible to widen the width of bridge lanes simultaneously when they are rebuilt or expanded. This is resulting in the bridge width being smaller than the road surface, which leads to a phenomenon of the wide-road-and-narrow-bridge section. This phenomenon is particularly common on rural roads. According to the actual survey of the traffic data of the wide-road-and-narrow-bridge section, it can be seen that the one-way lane width of the road is less than 3 m and the bridge lane width is lower than the width of the roadway. The location speed and average time headway of vehicles on the road are greater than that of the bridge section. Based on these data, when the vehicle is driving, the driver will produce deceleration behaviour because of the narrowing of the road ahead. Therefore, the one-way lane width and the road-bridge connection dimension are the key points in this study. Due to the limitation of the number and width of lanes, overtaking is not allowed when there is traffic in the opposite direction. Since the height limit at the bridge is 2.2 meters, only small cars and non-motorised vehicles are allowed to pass. At the same time, according to the data of the traffic flow of the field research, non-motorised vehicle demand is high and the ratio of motorised-to-non-motorised vehicles approaches 1:1. Furthermore, the road does not have a road centreline and the driving behaviour of non-motorised vehicles has a great impact on the capacity. Thus, the ratio of motorised-to-non-motorised is also a significant factor affecting road capacity. To sum up: the selected influencing factors of the wide-road-and-narrow-bridge section include the one-way lane width of roadway, the one-way lane width of bridge, the ratio of motorised to non-motorised vehicles and the road-bridge connection dimension.

SIMULATION FEASIBILITY ANALYSIS
The workload and difficulty of manual research are substantial, so the obtained data is characterised by complexity and uncertainty. In order to analyse the impact of each combination of factors on the traffic capacity of the wide-road-and-narrow-bridge section through the VISSIM simulation software, the feasibility of the model should first be verified.

VISSIM-based model for the wide-road-and-narrow-bridge section
According to the actual research of the evening peak data, the traffic scene situation of the wide-road-andnarrow-bridge section is reconstructed and the driving behaviour is simulated using the VISSIM simulation software. Then the relative error between the actual and simulated values is used as an evaluation index to verify the accuracy of the model. Furthermore, the VISSIM simulation software is used to establish the road-bridge connection to restore the reconstruction of the wide-road-and-narrow-bridge section. The simulation model diagram of the wide-road-and-narrow-bridge section is shown in Figure 2.

Data processing and analysis of results
The simulated data were compared with the actual measured data. The Geoffrey E. Havers (GEH) statistic avoids some of the pitfalls that occur when using simple percentages to compare two groups of traffic capacity. The GEH of less than 5.0 shows that the modelled hourly transaction volume matches the observed hourly transaction volume. Use the GEH statistics and relative errors to verify the feasibility of the model. The GEH is calculated as follows: where M denotes simulation traffic volume and C denotes actual traffic volume. The validation results are shown in Table 2. According to the simulation validation data in Table 2, the relative error of the validation indicators for each observation point is less than 10% and the GEH statistic for each traffic volume is less than 5. Therefore, it can be proved that the VISSIM simulation model can well reproduce the field research data and has high reproducibility, and it can be used for the subsequent study of the influence factors in practice.

FACTOR IMPORTANCE ANALYSIS BASED ON THE RF MODEL
The RF model is a decision tree-based machine learning algorithm proposed by Breiman, which randomly selects data and optimal features and constructs several different sub-decision trees for training to improve the diversity of the system and thus the accuracy of the model [12]. At the same time, the RF model can calculate the importance of the feature based on the contribution of the feature in each decision tree and can rank the importance of the factors affecting the capacity of wide-road-and-narrow-bridge sections. The RF models are fast to train, generalise well, resist noise and generally do not over-fit.

Selection of factor levels
According to the site survey, the roadway one-way lane width, bridge deck one-way lane width, motorised vehicles to non-motorised vehicles ratio and road-bridge connection dimension are the four factors researched in the study of the capacity of the wide-road-and-narrow-bridge section. Several levels of combination tests are selected respectively. The road-bridge connection dimension is the length of the tapering section of the wide road connected to the narrow bridge. The relevant specification stipulates that for lowgrade rural roads, the general value of the road tapering section is 20 m, so the factor levels are selected to be 20 m and 30 m. On rural roads, the number of non-motorised vehicles is huge, which is also the main factor that affects the traffic capacity. The ratio of motorised vehicles to non-motorised vehicles selected a 10-factor level, that is, in 10% steps from the share of non-motorised vehicles from 0% to 90%. Taking into account the specification and the actual rural road situation, the highest selected one-way lane width is 3.25 m. The road surface one-way lane width and bridge one-way lane width of the factor level selection and the combination is shown in Table 3. The models of different combinations are set up by the VISSIM simulation software. In order to restore the driving behaviour of vehicles slowing down and giving way to traffic passing on narrow bridges, the deceleration and priority rules were set in the bridge deck part of the models, and other traffic conditions were consistent in each combination. The traffic volume was gradually increased for simulation until the road could not complete the passage of all vehicles. The traffic flow at this time was recorded as the capacity of the road-bridge connecting section. Since there are seven horizontal factors for the one-way lane width, ten horizontal factors for the ratio of motorised-to-non-motorised vehicles, and two horizontal factors for the road-bridge connection dimensions, 140 sets of experimental data from the simulation models were combined. The capacity data for each combination is shown in Figure 3.
The meaning of the individual legends in Figure 3 is given by the example of 3.25-3.25-20. The first 3.25 indicates the roadway one-way lane width, the second 3.25 indicates the bridge deck one-way lane width and 20 denotes the dimension of the road-bridge connection section. Therefore, the first legend represents the road-bridge connection model when the road surface is 3.25 m wide, the bridge deck is 3.25 m wide and the road and bridge connection is 20 m. The meaning of the other legends can be interpreted similarly

Figure 4 -Flowchart of the RF model execution
Assume that the original data set consists of N sets of data with M-dimensional features in each. The flowchart of the RF model execution is shown in Figure 4, and its main steps are as follows: 1) Using the bootstrap method, K different sample datasets are randomly selected from the original dataset N as the sub-training set for each decision tree, with the number of samples taken each time being approximately 2/3 of the total number of samples in the original dataset. 2) From the sampled sample, m (greater than 0 and lower than M) features are randomly selected as the split feature set of the current node and the optimal features are selected from the subset for node splitting and branching according to the Gini index minimisation criterion. 3) Each tree grows recursively from top to bottom until it reaches a set minimum size of leaf nodes, then the tree stops growing, and finally, all the decision trees are combined into the RF. 4) The test data are input into the RF model and predicted separately using K decision trees, and finally, the average of the prediction results of each decision tree is taken as the final prediction result.
Based on the road survey, this study selected the roadway one-way lane width, the bridge deck one-way lane width, the motorised vehicles to non-motorised vehicles ratio and the road-bridge connection dimension as the input features of the model, and the capacity as the output variable of the model. In order to build the RF model of the wide-road-and-narrow-bridge sections, 80% of the data randomly selected from 140 sets of  simulation data is used as the training set of the RF using the Python language, and the remaining 20% is used as the test set of the model. Therefore, the N and M in this paper are 112 and 4, respectively.

Model accuracy tests
Based on the RF model of the road-bridge connection sections constructed above, predictions are made for 28 sets of data from the test set. The traffic capacity at this point is predicted by inputting data for four influencing factors: road one-way lane width, bridge deck one-way lane width, motorised vehicles to non-motorised vehicles ratio and road-bridge connection dimension. Furthermore, a graph is derived comparing the predicted values with the simulated values, where the simulated values are based on 20% of the values in the test set from the VISSIM data. A scatter plot (x-axis -simulated values, y-axis -predicted values) comparing the predicted values of the model with the simulated values of the test set is shown in Figure 5.
where o i and P i are the simulated and predicted values of the capacity, respectively, and ̅ o i is the mean value of the simulated values.
As can be seen from Figure 5, for each set of data in the test set, the predicted and simulated values of the capacity of the wide-road-and-narrow-bridge sections are very close to each other and the coefficient of determination is also as high as 0.9868, which shows that the RF model can still predict the capacity of the wide-road-and-narrow-bridge sections better under the situation of multifactor coupling and complex non-linear relationship.

Analysis of the importance of influencing factors
A vital advantage of the RF model is that it is possible to assess the importance of each variable for the simulation results. The RF model in this paper uses the Gini index to calculate the effect of each variable on the heterogeneity of the observations at each node of the decision tree; the smaller the value of this index the higher the purity. Then the Gini coefficients of each characteristic variable are normalised to reflect the relative importance coefficients of the influence of each factor on the capacity through a histogram, where a larger value of the characteristic score indicates a greater influence of the variable on the model results. Suppose X 1 , X 2 , X 3 , and X 4 denote roadway one-way lane width, bridge deck one-way lane width, motorised vehicles to non-motorised vehicles ratio and roadway-bridge connection dimensions, respectively, then the importance scores of influencing factors are calculated by the following steps: 1) Calculate the Gini index GI kq of X i before and after the branching of node q in the k th decision tree, which is its importance on that node. 2) Calculate the Gini index variation VIM ikq of X i before and after the branching of node q in the k th decision tree, which is its importance on the node. The VIM ikq is calculated as follows: 3) Add the importance scores of all nodes of X i in the decision k th tree, which is the importance VIM ik of X i in the decision tree. The VIM ik can be described as follows: The results of the characteristic scores for each factor are shown in Figure 6. As can be seen from Figure 6, the importance scores of each factor from the largest to the smallest are bridge deck one-way lane width, motorised vehicles to non-machine ratio, roadway one-way lane width and road-bridge connection dimensions, whose scores are 0.446, 0.383, 0.169, and 0.002, respectively. Therefore, the results show that the one-way lane width of the bridge deck has a significant impact on the capacity of the wide-road-and-narrow-bridge sections, followed by the motorised vehicles to non-motorised vehicles ratio and the roadway one-way lane width. Finally, the last one is the road-bridge connection dimension.

CONCLUSION
The actual two-lane scenario of the wide-road-and-narrow-bridge section was reconstructed by the VIS-SIM simulation software. By comparing the actual vehicle behaviour with the driving behaviour data simulated by the VISSIM, it can be seen that the VISSIM simulation software has good reducibility for the reconstruction of the scenario. Therefore, this paper proposes a method based on a combination of RF algorithms and VISSIM to analyse the factors influencing the capacity of the wide-road-and-narrow-bridge sections on rural roads. The results show that the coefficient of determination of the RF model is 0.9868, which is very close to 1, indicating that the model fits well. In addition, from the analysis of the data results, it can be seen that the magnitude of the influence on the capacity of the wide-road-and-narrow-bridge sections is bridge deck one-way lane width > motorised-to-non-motorised vehicles ratio > roadway one-way lane width > road-bridge connection dimension.
To sum up, in order to promote the construction of rural roads, the capacity of the wide-road-and-narrow-bridge section needs to be improved. First, if the funds are sufficient, the lane width of the road and bridge can be widened. Otherwise, the sidewalk on the bridge can be changed to a lane, which is equivalent to widening the bridge deck width. Pedestrians can be guided to other alternative routes to pass. Then, the regulation of non-motorised vehicles in rural areas should be strengthened. Certain policies should be implemented to restrict the passage of non-motorised vehicles and guide them to pass on non-critical sections. This can reduce the impact of non-motorised vehicles on critical roads of rural roads. Finally, as the length of the road-bridge connection section does not significantly impact the capacity, it can be set according to the specifications.
According to the result of the RF model, the method can accurately and objectively predict the road traffic capacity of the rural bottleneck section in multi-factor scenarios, which proves the feasibility and effectiveness of the RF model for the rural bottleneck section. At the same time, due to the advantages of adaptability and stability, the RF model could be generalised to other bottleneck road studies. However, when using RF models to study different bottleneck sections, the different influencing factors need to be considered due to different environments. Therefore, in the following research, more influencing factors of bottleneck sections can be used to further improve the applicability of the model.