Reinforcement Learning-Based Routing Protocols in Vehicular and Flying Ad Hoc Networks – A Literature Survey


  • Pavle Bugarčić University of Belgrade, Faculty of Transport and Traffic Engineering
  • Nenad Jevtić University of Belgrade, Faculty of Transport and Traffic Engineering
  • Marija Malnar University of Belgrade, Faculty of Transport and Traffic Engineering



reinforcement learning, Q-learning, routing protocols, VANET, FANET, ITS


Vehicular and flying ad hoc networks (VANETs and FANETs) are becoming increasingly important with the development of smart cities and intelligent transportation systems (ITSs). The high mobility of nodes in these networks leads to frequent link breaks, which complicates the discovery of optimal route from source to destination and degrades network performance. One way to overcome this problem is to use machine learning (ML) in the routing process, and the most promising among different ML types is reinforcement learning (RL). Although there are several surveys on RL-based routing protocols for VANETs and FANETs, an important issue of integrating RL with well-established modern technologies, such as software-defined networking (SDN) or blockchain, has not been adequately addressed, especially when used in complex ITSs. In this paper, we focus on performing a comprehensive categorisation of RL-based routing protocols for both network types, having in mind their simultaneous use and the inclusion with other technologies. A detailed comparative analysis of protocols is carried out based on different factors that influence the reward function in RL and the consequences they have on network performance. Also, the key advantages and limitations of RL-based routing are discussed in detail.


Nagib RA, Moh S. Reinforcement learning-based routing protocols for vehicular ad doc networks: A comparative survey. IEEE Access. 2021;9: 27552-27587. doi: 10.1109/ACCESS.2021.3058388.

Rezwan S, Choi W. A survey on applications of reinforcement learning in flying ad-hoc networks. Electronics. 2021;10(4): 449. doi: 10.3390/electronics10040449.

Doddalinganavar SS, Tergundi PV, Patil SR. Survey on deep reinforcement learning protocol in VANET. Proc. of the 1st Int. Conf. on Advances in Information Technology, ICAIT, 25-27 July 2019, Chikmagalur, India. IEEE; 2019. p. 81-86.

Sutton R, Barto A. Reinforcement learning: An introduction, second edition. Cambridge, Massachusetts: MIT Press; 2018.

Mnih V, et al. Human level control through deep reinforcement learning. Nature. 2015;518(7540): 529-533. doi: 10.1038/nature14236.

Wang Z, et al. Dueling network architectures for deep reinforcement learning. Proc. of the 33rd Int. Conf. on Machine Learning, 20-22 June 2016, New York, NY, USA. PMLR; 2016. p. 1995-2003.

Bi X, Gao D, Yang M. A reinforcement learning-based routing protocol for clustered EV-VANET”. Proc. of the 5th Information Technology and Mechatronics Engineering Conf, ITOEC, 12-14 June 2020, Chongqing, China. IEEE; 2020. p. 1769-1773.

Jafarzadeh O, Dehghan M, Sargolzaey H, Esnaashari MM. A novel protocol for routing in vehicular ad hoc network based on model-based reinforcement learning and fuzzy logic. Int. Journal of Information and Communication Technology Research. 2020;12(4): 10-25.

Ji X, et al. Keep forwarding path freshest in VANET via applying reinforcement learning. Proc. of the 1st Int. Workshop on Network Meets Intelligent Computations, NMIC, 7-9 July 2019, Dallas, TX, USA. IEEE; 2019. p. 13-18.

Li F, et al. Hierarchical routing for vehicular ad hoc networks via reinforcement learning. IEEE Transactions on Vehicular Technology. 2019;68(2): 1852-1865. doi: 10.1109/TVT.2018.2887282.

Wu C, Yoshinaga T, Bayar D, Ji Y. Learning for adaptive anycast in vehicular delay tolerant networks. Journal of Ambient Intelligence and Humanized Computing. 2019;10(4): 1379-1388. doi: 10.1007/s12652-018-0819-y.

Zhang D, Zhang T, Liu X. Novel self-adaptive routing service algorithm for application in VANET. Applied Intelligence. 2019;49(5): 1866-1879. doi: 10.1007/s10489-018-1368-y.

Roh BS, Han MH, Ham JH, Kim KI. Q-LBR: Q-learning based load balancing routing for UAV-assisted VANET. Sensors. 2020;20(19): 1-17. doi: 10.3390/s20195685.

Wu J, Fang M, Li X. Reinforcement learning based mobility adaptive routing for vehicular ad-hoc networks. Wireless Personal Communications. 2018;101(4): 2143-2171. doi: 10.1007/s11277-018-5809-z.

Wu J, Fang M, Li H, Li X. RSU-assisted traffic-aware routing based on reinforcement learning for urban VANETs. IEEE Access. 2020;8: 5733-5748. doi: 10.1109/ACCESS.2020.2963850.

Li G, et al. An efficient reinforcement learning based charging data delivery scheme in VANET-enhanced smart grid. Proc. of the Int. Conf. on Big Data and Smart Computing, BIGCOMP, 19-22 Feb. 2020, Busan, South Korea. IEEE; 2020. p. 263-270.

Luo L, Sheng L, Yu H, Sun G. Intersection-based V2X routing via reinforcement learning in vehicular ad hoc networks. IEEE Transactions on Intelligent Transportation Systems. 2021;1-14. doi: 10.1109/TITS.2021.3053958.

Yang XY, Zhang WL, Lu HM, Zhao L. V2V routing in VANET based on heuristic Q-learning. Int. Journal of Computers, Communications and Control. 2020;15(5): 1-17. doi: 10.15837/ijccc.2020.5.3928.

Bouzid Smida E, Gaied Fantar S, Youssef H. Link efficiency and quality of experience aware routing protocol to improve video streaming in urban VANETs. Int. Journal of Communication Systems. 2019;33(3): e4209. doi: 10.1002/dac.4209.

Lolai A, et al. Reinforcement learning based on routing with infrastructure nodes for data dissemination in vehicular networks. Wireless Networks. 2022;28: 2169-2184. doi: 10.1007/s11276-022-02926-w.

Nahar A, Das D. Adaptive reinforcement routing in software defined vehicular networks. Proc. of the Int. Wireless Communications and Mobile Computing, IWCMC, 15-19 June 2020, Limassol, Cyprus. IEEE; 2020. p. 2118–2123.

Dai C, et al. Learning based security for VANET with blockchain. Proc. of the Int. Conf. on Comm. Systems, ICCS, 19-21 Dec. 2018, Chengdu, China. IEEE; 2018. p. 210-215.

Jiang S, Huang Z, Ji Y. Adaptive UAV-assisted geographic routing with Q-learning in VANET. IEEE Communications Letters. 2021;25(4): 1358-1362. doi: 10.1109/LCOMM.2020.3048250.

Wu C, Yoshinaga T, Ji Y, Zhang Y. Computational intelligence inspired data delivery for vehicle-to-roadside communications. IEEE Transactions on Vehicular Technology. 2018;67(12): 12038-12048. doi: 10.1109/TVT.2018.2871606.

Zhang WL, Yang XY, Song QX, Zhao L. V2V routing in VANET based on fuzzy logic and reinforcement learning. Int. Journal of Computers, Communications & Control. 2021;16(1): 1-19. doi: 10.15837/ijccc.2021.1.4123.

Chang A, et al. A context-aware edge-based VANET communication scheme for ITS. Sensors. 2018;18(7). doi: 10.3390/s18072022.

Saravanan M, Ganeshkumar P. Routing using reinforcement learning in vehicular ad hoc networks. Computational Intelligence. 2020;36(2): 682-697. doi: 10.1111/coin.12261.

Ye S, Xu L, Li X. Vehicle-mounted self-organizing network routing algorithm based on deep reinforcement learning. Wireless Communications and Mobile Computing. 2021;2021. doi: 10.1155/2021/9934585.

Zhang D, Yu FR, Yang R. A machine learning approach for software-defined vehicular ad hoc networks with trust management. Proc. of the Global Communications Conf, GLOBECOM, 9-13 Dec. 2018, Abu Dhabi, United Arab Emirates. IEEE; 2018. p 1-6.

Yang Y, Zhao R, Wei X. Research on data distribution for VANET based on deep reinforcement learning. Proc. of the Int. Conf. on Artificial Intelligence and Advanced Manufacturing, AIAM, 16-18 Oct. 2019, Dublin, Ireland. IEEE; 2019. p. 484-487.

Nahar A, Das D. SeScR: SDN-enabled spectral clustering-based optimized routing using deep learning in VANET environment. Proc. of the 19th Int. Symp. on Network Computing and Applications, NCA, 24-27 Nov. 2020, Cambridge, MA, USA. IEEE; 2020. p. 1-9.

Zhang D, Yu FR, Yang R, Zhu L. Software-defined vehicular networks with trust management: A deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems. 2020;23(2): 1400-1414. doi: 10.1109/TITS.2020.3025684.

Zhang D, Yu FR, Yang R, Tang H. A deep reinforcement learning-based trust management scheme for software-defined vehicular networks. MSWIM '18: Proc. of the 8th ACM Symp. on Design and Analysis of Intelligent Vehicular Networks and Applications, DIVANet 2018, 28. Oct. - 2. Nov. 2018, Montreal, Canada. New York: Association for Computing Machinery; 2018. p. 1-7.

Zhang D, Yu FR, Yang R. Blockchain-based distributed software-defined vehicular networks: A dueling deep Q-learning approach. IEEE Transactions on Cognitive Communications and Networking. 2019;5(4): 1086-1100. doi: 10.1109/TCCN.2019.2944399.

Jafarzadeh O, Dehghan M, Sargolzaey H, Esnaashari MM. A model based reinforcement learning protocol for routing in vehicular ad hoc network. Wireless Personal Communications. 2021;123: 975-1001. doi:

Li J, Chen M. QMPS: Q-learning based message prioritizing and scheduling algorithm for flying ad hoc networks. Proc. of the Int. Conf. on Networking and Network Applications, NaNA, 10-13 Dec. 2020, Haikou City, China. IEEE; 2020. p. 265-270.

Arafat MY, Moh S. A Q-learning-based topology-aware routing protocol for flying ad hoc networks. IEEE Internet of Things Journal. 2021;9(3): 1985-2000. doi: 10.1109/JIOT.2021.3089759.

Zheng Z, Sangaiah AK, Wang T. Adaptive communication protocols in flying ad hoc network. IEEE Communications Magazine. 2018;56(1): 136-142. doi: 10.1109/MCOM.2017.1700323.

Mowla NI, Tran NH, Doh I, Chae K. AFRL: Adaptive federated reinforcement learning for intelligent jamming defense in FANET. Journal of Communications and Networks. 2020;22(3): 244-258. doi: 10.1109/JCN.2020.000015.

Sliwa B, Schüler C, Patchou M, Wietfeld C. PARRoT: Predictive ad-hoc routing fueled by reinforcement learning and trajectory knowledge. Proc. of the 93rd Vehicular Technology Conf, VTC2021-Spring, 25-28 Apr. 2021, Helsinki, Finland. IEEE; 2021. p. 1-7.

Da Costa LALF, Kunst R, De Freitas EP. Q-FANET: Improved Q-learning based routing protocol for FANETs. Computer Networks. 2021;198. doi: 10.1016/j.comnet.2021.108379.

Liu J, et al. QMR: Q-learning based multi-objective optimization routing protocol for flying ad hoc networks. Computer Communications. 2020;150: 304-316. doi: 10.1016/j.comcom.2019.11.011.

Khan M, Yau KL. Route selection in 5G-based flying ad-hoc networks using reinforcement learning. Proc. of the 10th Int. Conf. on Control System, Computing and Engineering, ICCSCE, 21-22 Aug. 2020, Penang, Malaysia. IEEE; 2020. p. 23-28.

Yang Q, Jang SJ, Yoo SJ. Q-learning-based fuzzy logic for multi-objective routing algorithm in flying ad hoc networks. Wireless Personal Communications. 2020;113: 115-138. doi: 10.1007/s11277-020-07181-w.

Liu J, Wang Q, He C, Hu Y. ARdeep: Adaptive and reliable routing protocol for mobile robotic networks with deep reinforcement learning. Proc. of the 45th Conf. on Local Computer Networks, LCN, 16-19 Nov. 2020, Sydney, Australia. IEEE; 2020. p. 465-468.

Ayub MS, et al. Intelligent hello dissemination model for FANET routing protocols. IEEE Access. 2022;10: 46513-46525. doi: 10.1109/ACCESS.2022.3170066.

He C, Liu S, Han S. A fuzzy logic reinforcement learning-based routing algorithm for flying ad hoc networks. Proc. of the Int. Conf. on Computing, Networking and Communications, ICNC, 17-20 Feb. 2020, Big Island, HI, USA. IEEE; 2020. p. 987-991.

Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. 3rd edition. Beijing: China Machine Press; 2009.

Rodriguez-Bocca P. Quality-centric design of peer-to-peer systems for live-video broadcasting. PhD thesis. Facultad de Ingeniería, Universidad de la República Rennes, France; 2008.




How to Cite

Bugarčić, P., Jevtić, N., & Malnar, M. (2022). Reinforcement Learning-Based Routing Protocols in Vehicular and Flying Ad Hoc Networks – A Literature Survey. Promet - Traffic&Transportation, 34(6), 893–906.