Intent Recognition and Trajectory Prediction for Multiple Types of Traffic Participants at an Unsignalized Intersection Based on Bidirectional Spatiotemporal Attention Network
Downloads
This work investigates intent recognition and trajectory prediction for multiple types of traffic participants at an unsignalized intersection within connected intelligence environments based on bidirectional spatiotemporal attention network (Bi-STANet). An unsignalized intersection is used as the studied object, where the participants include Connected and Automated Vehicles (CAVs), Human Vehicles (HVs), bicyclists, and pedestrians. A novel method is proposed based on Bi-STANet. First, a multimodal spatiotemporal feature extraction model is constructed based on (2+1)-dimensional CNN ((2+1)D CNN), where grid encoding method is used to unify the spatial structure and 2D convolution is used to extract spatial features for capturing the disordered characteristics of participants. Temporal dynamics are modelled via 1D convolution along the time axis, enabling spatiotemporal decoupling. Second, a bidirectional dynamic interaction model is developed by integrating LSTM-based temporal feature extraction with (2+1)D CNN layers, where heterogeneous modality fusion is implemented through a Bidirectional Contextual Block (BiCoBlock). Finally, a model integrating dynamic interaction, intent recognition, and trajectory prediction is developed. The proposed method is validated through the inD dataset innovatively. The results show that the average accuracy of intent recognition can reach to 95.4%. Within a 3-second horizon, Average Displacement Error (ADE) and Final Displacement Error (FDE) can be reduced to 0.51 m and 0.64 m, respectively, compared with the best baseline model. In Ablation studies, intent recognition F1-score can be enhanced by 7.2%, and ADE and FDE of trajectory prediction can be enhanced by 41.4% and 39.0%, respectively.
Downloads
Zhu D, Khan Q, Cremers D. Multi-vehicle trajectory prediction and control at intersections using state and intention information. Neurocomputing. 2024;574:127220. DOI: 10.1016/j.neucom.2023.127220.
Li Z, Gong J, Lu C, Yi Y. Interactive Behavior Prediction for Heterogeneous Traffic Participants in the Urban Road: A Graph-Neural-Network-Based Multitask Learning Framework. IEEE/ASME Transactions on Mechatronics. 2021;26(3):1339-49. DOI: 10.1109/TMECH.2021.3073736.
Mo X, Huang Z, Xing Y, Lv C. Multi-Agent Trajectory Prediction With Heterogeneous Edge-Enhanced Graph Attention Network. IEEE Transactions on Intelligent Transportation Systems. 2022;23(7):9554-67. DOI: 10.1109/TITS.2022.3146300.
Li Z, Lu C, Yi Y, Gong J. A Hierarchical Framework for Interactive Behaviour Prediction of Heterogeneous Traffic Participants Based on Graph Neural Network. IEEE Transactions on Intelligent Transportation Systems. 2022;23(7):9102-14. DOI: 10.1109/TITS.2021.3090851.
Zyner A, Worrall S, Nebot E. A Recurrent Neural Network Solution for Predicting Driver Intention at Unsignalized Intersections. IEEE Robotics and Automation Letters. 2018;3(3):1759-64. DOI: 10.1109/LRA.2018.2805314.
Azadani MN, Boukerche A. STAG: A novel interaction-aware path prediction method based on Spatio-Temporal Attention Graphs for connected automated vehicles. Ad Hoc Networks. 2023;138:103021. DOI: 10.1016/j.adhoc.2022.103021.
Byeon H, et al. Reinforcement Learning for Dynamic Optimization of Lane Change Intention Recognition for Transportation Networks. IEEE Transactions on Intelligent Transportation Systems. 2025;1-11. DOI: 10.1109/TITS.2025.3529299.
Qiao S, Gao F, Wu J, Zhao R. An Enhanced Vehicle Trajectory Prediction Model Leveraging LSTM and Social-Attention Mechanisms. IEEE Access. 2024;12:1718-26. DOI: 10.1109/ACCESS.2023.3345643.
Hosford K, Cloutier M-S, Winters M. Observational Study of Pedestrian and Cyclist Interactions at Intersections in Vancouver, BC and Montréal, QC. Transportation research record. 2020;2674(6):410-9. DOI: 10.1177/0361198120919407.
Benhelal MS, Jouaber B, Afifi H, Moungla H. Towards Edge-Assisted Trajectory Prediction for Connected Autonomous Vehicles. In GLOBECOM 2023-2023 IEEE Global Communications Conference. 2023. p. 3741-3746. DOI: 10.1109/GLOBECOM54140.2023.10437498.
Xin L, et al. Intention-aware Long Horizon Trajectory Prediction of Surrounding Vehicles using Dual LSTM Networks. In 2018 21st International Conference on Intelligent Transportation Systems (ITSC). 2018. p. 1441-1446. DOI: 10.1109/ITSC.2018.8569595.
Huang M, Qian H, Han Y, Xiang W. R(2+1)D-based Two-stream CNN for Human Activities Recognition in Videos. In 2021 40th Chinese Control Conference (CCC). 2021. p. 7932–7937. DOI: 10.23919/CCC52363.2021.9549432.
Zou Y, et al. Two-Stream (2+1)D CNN Based on Frame Difference Attention for Driver Behavior Recognition. In2023 10th International Conference on Dependable Systems and Their Applications (DSA). 2023. p. 782–788. DOI: 10.1109/DSA59317.2023.00110.
Chen L, et al. A short-term traffic flow prediction model for road networks using inverse isochrones to determine dynamic spatiotemporal correlation ranges. Physica A: Statistical Mechanics and Its Applications. 2025;657:130244. DOI: 10.1016/j.physa.2024.130244.
Jiang Y, et al. A spatiotemporal optimization method for connected and autonomous vehicle operations in long tunnel constructions. Physica A: Statistical Mechanics and Its Applications. 2024;651:130041. DOI: 10.1016/j.physa.2024.130041.
Wu K, et al. Graph-Based Interaction-Aware Multimodal 2D Vehicle Trajectory Prediction Using Diffusion Graph Convolutional Networks. IEEE Transactions on Intelligent Vehicles. 2024;9(2):3630-43. DOI: 10.1109/TIV.2023.3341071.
Meng Z, Zhao H, Tan W, Wang D. A Novel Approach for Stratifying Pulmonary Edema Severity on Chest X-ray via Dual-Mechanic Self-Learning and Bidirectional Multi-Modal Cross-Attention Algorithms. In Journal of Physics: Conference Series. 2024;2829:012019. DOI: 10.1088/1742-6596/2829/1/012019.
Chen Y, et al. Bidirectional feature fusion via cross-attention transformer for chrysanthemum classification. Pattern Analysis and Applications. 2025;28(2):41. DOI: 10.1007/s10044-025-01419-8.
Liu D, Mao Q, Gao L, Wang G. Leveraging Contrastive Language–Image Pre-Training and Bidirectional Cross-attention for Multimodal Keyword Spotting. Engineering Applications of Artificial Intelligence. 2024;138:109403. DOI: 10.1016/j.engappai.2024.109403.
Li X, et al. Multimodal temperature prediction for lithium-ion battery thermal runaway using multi-scale gated fusion and bidirectional cross-attention mechanisms. Journal of Energy Storage. 2025;116:116098. DOI: 10.1016/j.est.2025.116098.
Ren B, et al. A data-driven approach to traffic vehicle intent recognition and trajectory prediction. IEEE Transactions on Intelligent Vehicles. 2024:1-10. DOI: 10.1109/TIV.2024.3484494.
Xu H, et al. Behavior recognition of non-motorized transport at intersections using dual-channel grid model based on disordered trajectory point data. Physica A: Statistical Mechanics and Its Applications. 2024;650:129994. DOI: 10.1016/j.physa.2024.129994.
Lin M, et al. A 3D Convolution-Incorporated Dimension Preserved Decomposition Model for Traffic Data Prediction. IEEE Transactions on Intelligent Transportation Systems. 2025;26:673-90. DOI: 10.1109/TITS.2024.3486963.
Tran D, et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition. InProceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018. p. 6450-6459. DOI: 10.1109/cvpr.2018.00675.
Yang L, et al. Lite-FPN for keypoint-based monocular 3D object detection. Knowledge-Based Systems. 2023;271:110517. DOI: 10.1016/j.knosys.2023.110517.
Zhang Y, He Y, Zhang L. Recognition method of abnormal driving behavior using the bidirectional gated recurrent unit and convolutional neural network. Physica A: Statistical Mechanics and Its Applications. 2023;609:128317. DOI: 10.1016/j.physa.2022.128317.
Wang Y, et al. Multi-Vehicle Collaborative Learning for Trajectory Prediction With Spatio-Temporal Tensor Fusion. IEEE Transactions on Intelligent Transportation Systems. 2022;23(1):236-48. DOI: 10.1109/TITS.2020.3009762.
Bock J, et al. The inD Dataset: A Drone Dataset of Naturalistic Road User Trajectories at German Intersections. In2020 IEEE Intelligent Vehicles Symposium (IV). 2020. p. 1929-1934. DOI: 10.48550/arXiv.1911.07602.
Xie G, et al. Motion trajectory prediction based on a CNN-LSTM sequential model. Science China Information Sciences. 2020;63(11):212207. DOI: 10.1007/s11432-019-2761-y.
Yang W, et al. Method of Predicting Braking Intention Using LSTM-CNN-Attention With Hyperparameters Optimized by Genetic Algorithm. International Journal of Control, Automation and Systems. 2024;22(7):2301-12. DOI: 10.1007/s12555-021-1113-x.
Min H, Xiong X, Wang P, Zhang Z. A Hierarchical LSTM-Based Vehicle Trajectory Prediction Method Considering Interaction Information. Automotive Innovation. 2024;7(1):71-81. DOI: 10.1007/s42154-023-00261-0.
Copyright (c) 2026 Yanan Hou, Mingbao Pang, Huamin Liang

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.













