Intent Recognition and Trajectory Prediction for Multiple Types of Traffic Participants at an Unsignalized Intersection Based on Bidirectional Spatiotemporal Attention Network

unsignalized intersection bidirectional spatiotemporal attention network (Bi-STANet) (2+1)D CNN multiple types of traffic participants intent recognition trajectory prediction

Authors

  • Yanan HOU School of Civil and Transportation Engineering, Hebei University of Technology, Tianjin, China
  • Mingbao PANG
    pmbpgy@hebut.edu.cn
    School of Civil and Transportation Engineering, Hebei University of Technology, Tianjin, China
  • Huamin LIANG School of Civil Engineering, Tianjin Chengjian University, Tianjin, China

Downloads

This work investigates intent recognition and trajectory prediction for multiple types of traffic participants at an unsignalized intersection within connected intelligence environments based on bidirectional spatiotemporal attention network (Bi-STANet). An unsignalized intersection is used as the studied object, where the participants include Connected and Automated Vehicles (CAVs), Human Vehicles (HVs), bicyclists, and pedestrians. A novel method is proposed based on Bi-STANet. First, a multimodal spatiotemporal feature extraction model is constructed based on (2+1)-dimensional CNN ((2+1)D CNN), where grid encoding method is used to unify the spatial structure and 2D convolution is used to extract spatial features for capturing the disordered characteristics of participants. Temporal dynamics are modelled via 1D convolution along the time axis, enabling spatiotemporal decoupling. Second, a bidirectional dynamic interaction model is developed by integrating LSTM-based temporal feature extraction with (2+1)D CNN layers, where heterogeneous modality fusion is implemented through a Bidirectional Contextual Block (BiCoBlock). Finally, a model integrating dynamic interaction, intent recognition, and trajectory prediction is developed. The proposed method is validated through the inD dataset innovatively. The results show that the average accuracy of intent recognition can reach to 95.4%. Within a 3-second horizon, Average Displacement Error (ADE) and Final Displacement Error (FDE) can be reduced to 0.51 m and 0.64 m, respectively, compared with the best baseline model. In Ablation studies, intent recognition F1-score can be enhanced by 7.2%, and ADE and FDE of trajectory prediction can be enhanced by 41.4% and 39.0%, respectively.