Optimised LSTM Neural Network for Traffic Speed Prediction with Multi-Source Data Fusion

Predicting traffic speed accurately and in real-time is crucial for the development of smart transportation systems. Given the nonlinear and stochastic nature of vehicle data, integrating diverse spatio-temporal data sources with the Improved Particle Swarm Optimisation (IPSO) offers a promising approach to optimise the Long Short-Term Memory Neural Network (LSTM). Firstly, we enhance the optimisation capabilities of PSO by implementing nonlinear inertial weight and adaptive variation. Secondly, addressing the challenge of selecting the LSTM hyperparameters, the PSO algorithm effectively identifies global optimal solutions for hyperparameter optimisation, ensuring appropriate settings through iterative training. Subsequently, we conduct a case study using multi-source spatio-temporal traffic speed data, comparing our proposed IPSO-LSTM model with traditional neural network prediction models and advanced models. Results from the experiment demonstrate that the IPSO-LSTM model presented in this study addresses issues of parameter selection and inaccurate prediction encountered by traditional LSTM models in traffic state prediction. More-over, it enhances the model’s ability to capture speed time series dynamics. Notably, in processing complex speed data, our model exhibits superior accuracy and stability in prediction.


INTRODUCTION
In recent years, with the steady increase in the number of automobiles on roads, traffic congestion has escalated significantly.Accurate and real-time prediction of vehicle speed plays a pivotal role in travellers' route planning and time management endeavours.This predictive capability not only mitigates congestion during peak hours but also furnishes crucial support for traffic management authorities in devising proactive strategies.Accurately anticipating road speeds facilitates pre-emptive measures to circumvent congestion before it materialises.Hence, the development of speed prediction algorithms utilising vehicle driving data carries profound practical implications for the sophisticated control of highway vehicles.
In the domain of automotive intelligence, endeavours such as vehicle safety-assisted driving and intelligent vehicle behaviour decision analysis necessitate the examination of vehicle driving data to formulate corresponding strategies.The timelier and more precise the data acquisition, the higher the quality of ensuing decisions.Through speed prediction algorithms, driving data for future periods can be relayed to vehicle decision-making entities for analysis, thereby enabling the formulation of optimal behavioural strategies.Consequently, speed prediction algorithms emerge as indispensable components of vehicle intelligence.
Intelligent Transportation Systems (ITS) stand to enhance performance by assimilating future road network data.Effective traffic prediction requires the comprehension of the non-linear relationship between past and future traffic conditions.This functionality confers benefits across various applications, encompassing route guidance and congestion management [1].This study endeavours to augment traffic prediction accuracy and minimise network operation costs, thereby furnishing insights pertinent to traffic management and control.
The remainder of the paper is structured as follows: Section 2 provides a comprehensive review of the pertinent literature.Section 3 elucidates the prediction model based on deep learning, the enhanced particle swarm optimisation algorithm, and delineates the algorithmic steps in detail.Section 4 elaborates on the data format and processing procedures.Section 5 undertakes numerical calculations and conducts an analysis of the results.Finally, in Section 6, the research findings are summarised, along with an examination of limitations and potential avenues for future research.

LITERATURE REVIEW
There are two standard methods for predicting traffic conditions.One category involves parametric approaches such as autoregressive integrated moving average (ARIMA), multiple regression theory, exponential smoothing, grey model and Markov model, among others.The second category encompasses non-parametric methods such as the k-nearest neighbour model, neural networks, and more.
Among them, the parametric method helps elucidate the quantitative relationship between independent and dependent variables, and quantitatively analyse the influence of variables on dependent variables.Chandra and Al-Deek considered the correlation of traffic data (especially speed) between different locations, and introduced the VAR model as the prediction model of expressway section speed [2].Sun and Sun introduced a real-time collision prediction model employing a dynamic Bayesian network (DBN).Their goal was to analyse the relationship between dynamic traffic changes and collisions, specifically using expressway traffic speed data for real-time accident prediction [3].Wang et al. merged empirical mode decomposition (EMD) with autoregressive integrated moving average to forecast and analyse non-stationary and nonlinear traffic speed data series [4].Zhang et al. developed a traffic speed prediction framework employing high-order Markov chains.This framework encompassed both observable factors like traffic volume and speed, as well as latent environmental factors [5].Zeng et al. introduced the DWT-Bi-LSTM model, leveraging wavelet transform and bidirectional LSTM, to enhance accuracy and efficiency in predicting parking spot availability from past data [6].Ma et al. introduced a novel wavelet-based spatiotemporal multi-graph convolutional neural network to improve the precision of traffic speed prediction.By integrating multiple GCNs, 3DCNN and wavelet analysis, their model exhibited superior performance compared to standard methods across diverse time spans and real-world datasets.[7].
Non-parametric methods adapt internal model parameters based on data to establish relationships between variables and dependent variables.and spatial correlations for multi-step traffic condition prediction [14][15].Ma et al. devised a short-term traffic speed prediction technique utilising a hybrid spatio-temporal feature selection algorithm (STFSA), which integrates traffic flow spatio-temporal analysis.They combined convolutional neural network and gated recursive unit (CNN-GRU) methodologies for this purpose [16].Ma et al. introduced swarm intelligence optimised deep extreme learning machines (DELM) and applied them, along with variational mode decomposition (VMD), to road traffic prediction and autonomous lane change decisions [17].While neural networks offer various advantages, their predictive capacity diminishes when handling long time series data, thus prompting the introduction of the Long Short-Term Memory (LSTM) neural network.
In order to verify the effectiveness of the LSTM neural network in predicting traffic speed, Ma et al. employed the travel speed data of Beijing traffic microwave detector to conduct model training and testing [18].Jia et al. assessed the efficacy of both deep belief networks (DBN) and the LSTM models in shortterm traffic speed prediction.Findings indicated the LSTM's superior capability in capturing the temporal features of traffic speed data compared to DBN [19].Gu et al. fused the LSTM with gated cyclic unit (GRU) neural networks to create a two-layer deep learning framework (FDL) [20].Hu et al. incorporated the LSTM to address the challenge of gradient disappearance in practical applications, integrated the attention mechanism into the LSTM-RNN, and devised a high-precision short-term traffic flow prediction model [21].Meng et al. innovatively introduced a dynamic time distortion algorithm for time series processing and proposed a long short-term memory (D-LSTM) model with dynamic time distortion [22].Zeng et al. proposed a hybrid GRU-LSTM model using historical parking data and various factors to forecast parking availability, aiming to optimise resource use and ease traffic congestion amid growing urbanisation and vehicle numbers [23].Ma et al. introduced an improved short-term traffic flow prediction model that combines time series analysis with an enhanced LSTM network, specifically leveraging bidirectional LSTM (BiLSTM) [24].While neural networks offer various advantages, their ability to predict over extended time series is limited.This led to the proposal of the Long Short-Term Memory (LSTM) neural network.Despite the LSTM's strong performance, enhancing its predictive accuracy remains a concern due to challenges in determining the optimal number of hidden layers and neurons within those layers.Additionally, setting the appropriate learning rate and iteration count poses difficulties.The number of hidden layers and neurons significantly impacts the model's fitting ability, whereas the learning rate and iteration count influence the training process and the model's effectiveness.In practical applications, determining these parameters relies on empirical experience, resulting in considerable randomness that diminishes the predictive capability of the model.Moreover, within intricate urban road networks, the traffic state is subject to numerous interrelated factors that exhibit complex nonlinear relationships with speed.Among them, weather conditions, air quality, holiday activities, peak hours and other factors have a significant impact on urban traffic conditions.Lin et al. combined diverse new data sources with conventional data, devising a probabilistic framework encompassing a location decomposition model, traffic topic model and traffic speed Gaussian process model.The framework, termed the theme-enhanced Gaussian process aggregation model (TEGPAM), was proposed [25].Essien et al. built an LSTM model by merging urban road network traffic and weather datasets from Greater Manchester, UK.Experimental results highlighted that temperature variations significantly influenced traffic conditions, indicating its importance as a variable within the model [26].Li et al. (2019) extracted representative features from the data for fusion, established a prediction model to capture correlation, and used heterogeneous data to predict space-average-velocity [27].While numerous deep learning methods exist for traffic flow prediction, current research predominantly focuses on univariate traffic time data.There remains a limited extent of research on comprehensive traffic prediction models that integrate multi-source characteristic data fusion.
To sum up, most studies involve parametric and nonparametric prediction models, while some hybrid models are also presented.The deep learning model appears as a "black box" in the learning process and often lacks optimisation for internal parameter design.In addition, in previous studies, the characteristics of traffic speed considered for influence were relatively singular, which could not provide good prediction accuracy.Therefore, based on the actual data sets of Guangzhou traffic speed and weather conditions, this paper comprehensively considers the spatio-temporal factors that may affect the urban traffic speed, and designs an urban road short-time traffic speed prediction method based on the improved PSO optimised LSTM model.The results may provide potential insights for intelligent transportation systems in traffic state prediction.

LSTM
The LSTM, a specialised type of RNN, adeptly captures long-term dependencies without encountering the gradient vanishing issue.It incorporates a memory cell structure within the hidden layer neurons of RNN to retain past information.Additionally, it integrates three gate structures (input gate, forgetting gate and output gate) to regulate the utilisation of historical data.
If the input sequence is x 1 ,x 2 ,...,x T , and the state of the hidden layer is a 1 ,a 2 ,...,a Q then at the time q: where i q , λ q and j q are the input gate, the forget gate and the output gate respectively; δ q is the cell unit; s α is the weight of the recursive connection; s x is the weight from the input layer to the hidden layer; b i , b λ , b δ and b j are the thresholds of each function; σ(•) and g(•) are sigmoid and tanh functions respectively; • is the dot product.
To align LSTM with the prediction objective, an additional layer of linear regression needs incorporation, namely: b q q p s a = + pj p (6) where ξ q represents the output of the final prediction result; b ξ represents the threshold of linear regression.

Principle of standard PSO
If the potential solution to the optimisation problem is likened to a particle, it continuously navigates through space, adapting its position based on its individual experience and the optimal individual experience while seeking the optimal position.Initially, PSO initialises by acquiring a set of random solutions, then iteratively hunts for the optimal solution by monitoring the current spatially optimal particle.In a multi-dimensional search area, the ψ particles make up a population.In iteration β, the spatial location and speed of the particle α are τ α,β and ϕ α,β respectively, and the particle updates its position and velocity by supervising the two optimal solutions.The first solution pertains to the optimal solution sought independently by the particle itself, known as the individual extreme value η α .The second solution refers to the current optimal solution sought by the entire population, denoted as the global optimal solution κ α .While seeking these two optimal solutions, the particle adjusts its velocity and position utilising the following methods: where ε is inertial weight; χ 1 and χ 2 are learning factors; v is the random number between [0,1]; ι is the velocity coefficient, ι=1.

Principle of improved PSO
This paper enhances the inertia weight within the PSO algorithm and integrates the mutation mechanism of the genetic algorithm.This integration bolsters the search capabilities of the PSO algorithm, rendering it adaptable in its exploration abilities.This algorithm's fundamental concept involves formulating a mathematical model for the inertia weight, allowing for dynamic and adaptive adjustments throughout the entire iterative process.Early in the iteration, the inertia weight quickly diminishes, hastening the algorithm's convergence and refining its local optimisation capacity.Concurrently, the genetic algorithm's mutation mechanism intertwines with regenerating particle positions during iterations to amplify population diversity.This amalgamation enables the algorithm to explore beyond local optima, expanding its search across the solution space and fortifying global optimisation capabilities.The adaptive mutation concept balances the algorithm's local and global search abilities.
In the conventional PSO algorithm, the inertia weight serves to uphold particle motion's momentum while balancing the algorithm's global and local search capabilities.A significant inertia weight strengthens global search, weakens local search, accelerates algorithmic convergence, and yet compromises optimisation accuracy.Conversely, a smaller inertia weight diminishes global search strength, enhances local search, slows the algorithm's convergence, improves search precision, and yet elevates the risk of local optima.The evident conclusion is that the judicious adjustment of the inertia weight value profoundly impacts the algorithm's performance.Setting w will diminish global optimisation capabilities and decelerate the algorithm's convergence rate.Adaptive parameter adjustment is a technical method to improve convergence speed and optimise search performance.This paper suggests a computational approach to enhance PSO performance through the utilisation of nonlinearly changing inertia weights.The calculation Equation 9 is as follows: where Inspired by the mutation mechanism in the genetic algorithm, an adaptive variation function is proposed to give particles the ability to jump out of the local range.When rand ≥ prob, the particles mutate, the particles distribute evenly across the entire space, increasing the randomness of the particles.The Equations 10-11 for adaptive mutation probability are as follows: $ y w (11) where ϛ represents the adaptive variogram, which ranges within [0.5, 1].When ρ is small, ϛ has a low value, leading to a higher probability of υ ≥ ϛ , the particles are prone to mutation.It facilitates the particle's escape from local scopes, aiding in global exploration.As ρ increases, as ϛ goes up, the likelihood of υ ≥ ϛ going down increases.Particle mutation probability decreases, lowering the chance of particles getting stuck in local optima.This allows particles to conduct a broad global search before fine-tuning, ensuring comprehensive convergence.
To assess the impact of the enhanced particle swarm optimisation algorithm, the 5-dimensional Sphere test function was individually optimised using the standard PSO algorithm and the IPSO algorithm, followed by a comparative experiment.The two algorithms were iterated 1000 times each.The results were amplified by logarithmic processing, and the fitness curve was obtained as shown in Figure 1.Seen from the curve trend after taking logarithms in Figure 1, the traditional PSO algorithm falls into the local optimal solution very early in iteration.It no longer has the ability to search for optimisation.The enhanced IPSO algorithm effectively breaks away from local optima, persistently seeking optimal solutions even after 1000 iterations, highlighting its distinct advantages.

IPSO optimisation of the LSTM algorithm
Leveraging traffic speed and spatio-temporal data, a combined digital-analogue approach is employed for traffic speed prediction.Building upon the aforementioned theory, this study introduces an IPSO-optimised algorithm for the LSTM model.
LSTM models can succumb to overfitting when trained on limited data, thereby jeopardising prediction accuracy.To address this concern, regularisation methods are employed, notably the inclusion of a Dropout layer.This layer selectively deactivates neurons during training, preventing the network from relying too heavily on specific nodes and features.By injecting variability into hidden layer neurons across training iterations, the model's dependency on fixed node relationships is diminished, thus mitigating the risk of feature reliance.Dropout regularisation serves to diminish the impact of individual weight contributions, thereby mitigating model bias and curtailing the risk of overfitting in intricate neural network architectures.
The traffic speed prediction model's framework, integrating diverse data sources, is depicted in Figure 2. The algorithmic steps are outlined as follows: Step 1: Data pre-processing.By analysing the multi-source correlation of velocity data, multiple features with strong correlation were screened out, and the feature sequence of influencing factors was directly used as additional features.The time sequence data was processed in the way of the sliding window, with the window size set to 6, sliding forward one data point at a time, where six time points represent a duration of 1 hour.The model utilises six time-related features to predict the subsequent traffic speed.
Step 2: Initialise the parameters by setting the particle positions and velocities.A population particle ( , , , ), X a a , i 0 1 2 2 k is randomly generated, where a 1 denotes the quantity of neurons in the initial hidden layer; a 2 signifies the quantity of neurons in the secondary hidden layer; 2 stands for the learning rate within the LSTM; and ι signifies the count of iterations within the LSTM.The population size, iteration count, learning factor, location and speed are established.Each particle is encoded as an array containing four elements, representing the learning rate, the number of iterations, the quantity of neurons in the first hidden layer and the quantity of neurons in the second hidden layer, respectively.By adjusting the ranges of these parameters, we can control the magnitude of parameter changes during mutation, thereby influencing the extent of exploration in the search space.Furthermore, we introduce adaptive mutation operations to avoid premature convergence to local optima.Additionally, by adjusting the range of inertia weights, we influence the magnitude of particle velocity updates during mutation, thereby further optimising the search process of the algorithm, leading to improvements in convergence speed and search efficiency.
Step 3: Establish the particle's evaluation function by allocating the LSTM parameter to particle X i,0 acquired in Step 2. Segment the data into training, validation and prediction samples.The training data are utilised to train the neural network, resulting in the output values y t j t for the training samples and output values yv k t for the validation samples once the iteration limit is reached.The fitness value fit i for individual X i is defined using the mean square error (MSE) function as outlined in Equation 12: where y t j and yv k represent the anticipated output values of the training and validation samples, respectively.The conventional approach in prior research primarily focuses on the fitting error of training samples as the sole fitness value.However, this practice might lead to suboptimal predictive outcomes if the neural network Step 4: Compute the fitness value linked to position X i for each particle.Individual and population extreme values are established based on the initial fitness value of the particles, with each particle adopting its historical best position as its optimal position.
Step 5: During each iteration, the particle's velocity and position are updated using Equations 7 and 8, incorporating both the individual and global extreme values.Following this update, the new fitness value for the particle is computed.The individual and population extreme values are then adjusted based on the fitness values of the updated particles in the population.
Step 6: Once the PSO algorithm reaches its maximum iteration count, the forecasted data obtained are fed into the LSTM model trained using the optimal particle.Consequently, the output comprises the predicted traffic speed.

Model evaluation indicator
To assess the model's performance, the accuracy of the traffic speed prediction model is appraised using various evaluation metrics.These metrics encompass the mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE).The specific formulas are denoted in Equations 13-15.
In this context, ψ r * denotes the forecasted traffic speed yielded by the time series prediction model in period r; ψ r stands for the actual traffic speed data, and A signifies the predicted value.Larger MAPE, RMSE and MAE values indicate a greater margin of error within the model.

Data sources
The empirical study utilises traffic speed data provided by the Guangzhou Transportation Commission of China as the experimental dataset.This dataset covers 214 sections, primarily urban expressways and trunk roads, spanning from 1 August 2016 to 30 September 2016, encompassing a 61-day period.Speed data for each segment is aggregated daily within a 10-minute timeframe.
To comprehensively consider the influence of relevant characteristic variables on traffic speed, the characteristic variables affecting urban traffic state are input into the prediction model.Among them, air quality variables include the concentrations of SO2, NO2, CO and PM2.5 recorded hourly by monitoring stations.Meteorological features include real-time temperature, precipitation and weather conditions recorded by weather stations hour by hour (The codes "0-7" correspond to weather conditions as follows: "sunny, cloudy, cloudy, light rain, moderate rain, heavy rain, shower, rainstorm").The time attribute is the number of hours (indicating the number of hours in a day within 24 hours) and the duration (indicating the number of days in a week (the code "1" indicates Monday) at the time point.Whether it is a working day (code "1" means a working day and code "0" indicates a non-working day) is also indicated.

Data processing
We processed and analysed traffic data, revealing that the current traffic speed is similar to recent periods, with the most significant impact from recent times and a decreasing correlation over time.The traffic speed data exhibit distinct daily and weekly periodic characteristics.There is spatial correlation among traffic speeds across different road segments.We selected external factors, such as PM2.5, SO2, CO, weather and rainfall, with correlation coefficients exceeding 0.1 with speed data.Subsequently, we normalised various datasets based on these findings.Normalisation of traffic speed and meteorological characteristic data involves a linear transformation of the original data, ensuring that the resultant values are mapped within the [0,1] range.The formula is shown in Equation 16.
where μ is the scaled data; θ is the raw data of traffic speed and meteorological characteristics.The original data ranges from a minimum of θ min to a maximum of θ max .This paper divides the dataset, allocating 90% for training purposes and reserving 10% for testing.An LSTM model optimised through IPSO is utilised to forecast the future traffic speed specifically for Road_1.Sequence data were arranged and reconstructed before and after to construct the feature matrix, as shown in Table 1.

Experimental environment
The experiment is based on a python 2.8.1 environment and coded by the TensorFlow1.14.0 deep learning framework.The editing platform is Jupyter Notebook 1.0.0 and the operating system is Windows 10.The study constructed eight prediction models, namely: IPSO-optimised LSTM model with one hidden layer (IPSO-LSTM_1); PSO-optimised LSTM model with one hidden layer (PSO-LSTM_1); IPSO-optimised LSTM model with two hidden layers (IPSO-LSTM2); PSO-optimised LSTM model with two hidden layers (PSO-LSTM_2); General LSTM model; Bidirectional LSTM model (BiLSTM); GRU model; and BP neural network model (BPNN).To ensure experimental fairness, these models employ identical traffic data feature inputs and utilise the uniform Adam optimiser for comparative analysis in traffic speed prediction.

Initial parameter settings
Table 2 shows the detailed initialisation parameter settings for the IPSO-LSTM model.Details of the LSTM network training and IPSO algorithm optimisation are described below.

IPSO optimisation parameters determination
Figure 4 illustrates the fitness evolution of the LSTM optimised using particle swarm optimisation.Here, 1 and 2 denote the LSTM model with single and dual hidden layers, respectively.The left one in Figure 4 shows the fitness comparison of IPSO-LSTM_1 and PSO-LSTM_1, while the right one in Figure 4 gives the fitness comparison between IPSO-LSTM_2 and PSO-LSTM_2.The IPSO optimisation of the LSTM demonstrates a reduced fitness value and rapid convergence towards the minimum fitness level.This highlights the IPSO's superior global optimisation capability and faster convergence rate.5a indicates the learning rate ε fluctuates during iterations before stabilising at 0.0074, while Figure 5b indicates that the iteration number n of the LSTM fluctuates over PSO iterations and eventually steadies at 47 instances.In Figure 5c, the neuron count within the initial hidden layer, h 1 , finally settles at 81.In Figure 5d, the neuron count h 2 in the second hidden layer settles at 12.
To better explain the prediction performance of the models, Table 3 compares the prediction performance of each model using identical sample inputs.It can be seen that in comparison to the MAPE of BPNN, LSTM, GRU, Bi-LSTM, Conv-LSTM, CNN-LSTM and the prediction accuracy of IPSO-LSTM_2 is improved by 20.96%, 14.24%, 26.13%, 23.36%, 22.73% and 13.90%, respectively, showing the best prediction performance.Therefore, following the enhancement of the particle swarm optimisation algorithm and its integration with the LSTM, the model fitting effect is effectively enhanced and the prediction accuracy is significantly improved.To validate the effectiveness of our model across different prediction time ranges, we conducted prediction experiments on various time intervals using the Guangzhou traffic speed dataset.The specific results are presented in Table 4.We found that our model performs excellently in the prediction tasks of 10, 20 and 30 minutes, with performance improving as the prediction time extends.

Robustness analysis
After conducting experiments on the dataset from Guangzhou, we have demonstrated the robustness and generalisation capability of our model.To further validate our findings, we compared it with similar traffic speed data from Shenzhen, as shown in Table 5.The experimental results indicate that our model IPSO-LSTM_2 outperforms PSO-LSTM_2 in traffic speed prediction tasks in both Guangzhou and Shenzhen, confirming the robustness and generalisation of our model.We conducted an analysis of traffic speed on weekdays and weekends, revealing significant daily and weekly periodic features in the data.On weekdays, the traffic speed time series for the same road segment is similar to the previous day.Although the patterns differ between weekends and weekdays, the similarity between weekends also serves as an important basis for traffic prediction.Additionally, we used the IPSO-LSTM_2 model for prediction, and the accuracy curves are shown in

CONCLUSIONS
This study proposes a traffic speed prediction model based on multi-feature data fusion and IPSO-LSTM.Firstly, we optimised the PSO by integrating nonlinear weight variations and incorporating a mutation mechanism with genetic algorithms.These modifications enhanced the global optimisation capability of the PSO and accelerated its convergence speed.The fusion of the PSO and the LSTM involves optimising the number of hidden neurons, learning rate and iterations of the LSTM using the PSO.This approach alleviates the limitations of manually defining LSTM parameters, thereby improving the accuracy of traffic speed prediction.Compared to the PSO, the IPSO superior optimisation of the LSTM parameters, enhancing the overall predictive performance.Furthermore, we input data related to speed prediction, such as PM2.5, SO2, CO, weather and rainfall, into the model to improve the accuracy of traffic speed prediction.Lastly, evaluation of performance metrics such as the MAPE, RMSE and MAE reveals that the proposed LSTM neural network, integrated with multi-source data fusion and IPSO optimisation, outperforms other baseline models in terms of prediction accuracy and demonstrates good performance across different datasets and time intervals.
In summary, the proposed speed prediction method shows promising feasibility and provides a tailored research foundation for urban road traffic management.Considering the limited availability of device resources, this study conducted model training with a small sample size.Future exploration could focus on further researching big data fusion techniques, expanding the experimental sample, and more extensive applications in traffic state monitoring research.
They are less influenced by user intervention, thus exhibiting broader applicability.Research by Asif et al. and Yao et al. proposed a traffic speed prediction model with single time step length that comprehensively considered spatio-temporal parameters on the basis of the support vector machine (SVM) algorithm, so as to support travellers' route selection and traffic guidance [8-9].Tang et al. proposed a new fuzzy artificial neural network structure to predict multi-step forward driving speed based on three remote traffic microwave sensors in the southern section of the fourth Ring road in Beijing [10].Zang et al. introduced an MSTFLN model designed for long-term expressway traffic speed prediction.This model extracts intricate speed features by jointly learning from multi-scale inputs [11].Zheng et al. proposed a k-nearest neighbour (K-NN) method based on tensor, in which traffic mode involved multidimensional time information and bidirectional spatial information [12].Zhang et al. adjusted the hyperparameters of the multi-task learning (MTL) model by using the Bayesian optimisation method in combination with GPS data of a taxi, and concluded that the MTL model is leveraging deep learning [13].Zhang et al. and Zheng et al. introduced the attention-graph convolution sequence-to-sequence model (AGC-SEQ2SEQ), a deep learning framework aimed at capturing intricate non-stationary temporal dynamics Intelligent Transport Systems (ITS) max and ζ min are the upper and lower limits of ζ respectively; ρ is the current iteration number; ρ max is the maximum number of iterations.When ρ is small, ζ approximates ζ max .With the increase of ρ, ζ decreases in a non-linear fashion, while ζ decreases rapidly.This ensures the algorithm's local optimisation capacity while enabling flexible adjustments between its global and local optimisation capabilities.

Figure 4 -
Figure 4 -Comparison of fitness curvesFigure5displays the changes in the learning rate, iteration times and hidden layer node number with the iteration times of the IPSO algorithm.Figure5aindicates the learning rate ε fluctuates during iterations before stabilising at 0.0074, while Figure5bindicates that the iteration number n of the LSTM fluctuates over PSO iterations and eventually steadies at 47 instances.In Figure5c, the neuron count within the initial hidden layer, h 1 , finally settles at 81.In Figure5d, the neuron count h 2 in the second hidden layer settles at 12.
Figure5displays the changes in the learning rate, iteration times and hidden layer node number with the iteration times of the IPSO algorithm.Figure5aindicates the learning rate ε fluctuates during iterations before stabilising at 0.0074, while Figure5bindicates that the iteration number n of the LSTM fluctuates over PSO iterations and eventually steadies at 47 instances.In Figure5c, the neuron count within the initial hidden layer, h 1 , finally settles at 81.In Figure5d, the neuron count h 2 in the second hidden layer settles at 12.

Figure 5 -
The optimal values of ε, n, h 2 and h 2 in the IPSO-LSTM_2: a) Learning rate; b) Number of iterations of LSTM; c) Number of the neurons within the initial hidden layer; d) Number of the neurons within the secondary hidden layer5.DISCUSSION

Figure 6 -
Figure 6 -Prediction results of LSTM optimised by PSO and IPSO Figure 7c.It can be observed from the figure that our model demonstrates good performance and accuracy in both weekday and weekend speed prediction tasks.

Figure 7 -
Periodic characteristics and prediction accuracy analysis: a) Speed data of all sections in a day; b) Speed data of all sections in a week; c) The prediction accuracy curve

Table 1 -
Characteristic matrix of multi-source

Table 3 -
Prediction errors of other models

Table 4 -
Comparison of prediction performance at different time intervals

Table 5 -
Comparison of predictive performance of different datasets