

Firstly, a bidirectional recurrent neural network (BRNN) is used to achieve communication between UAV individuals, and the multi-UAV cooperative air combat maneuver decision model under the actor-critic architecture is established. Based on the research of the 1v1 autonomous air combat maneuver decision, this paper builds a multi-UAV cooperative air combat maneuver decision model based on multi-agent reinforcement learning. In order to improve the autonomous ability of unmanned aerial vehicles (UAV) to implement air combat mission, many artificial intelligence-based autonomous air combat maneuver decision-making studies have been carried out, but these studies are often aimed at individual decision-making in 1v1 scenarios which rarely happen in actual air combat. Both methods learn at least 38% faster than baseline methods and yield policies that outperform them.


We demonstrate the viability of our algorithms with Cartpole, Lunar Lander and an aircraft manoeuvring problem, three continuous-space environments with low-dimensional state variables. D2D-SQL uses the data to initialise a neural network which is then allowed to continue learning using another RL method. D2D-SPL uses the data to train a classifier which is then used as a controller for the RL problem. Both methods combine RL and supervised learning (SL) and are based on the idea that a fast-learning tabular method can generate off-policy data to accelerate learning in neural RL. This paper proposes two methods, Discrete-to-Deep Supervised Policy Learning (D2D-SPL) and Discrete-to-Deep Supervised Q-value Learning (D2D-SQL), whose objective is to acquire the generalisability of a neural network at a cost nearer to that of a tabular method. In complex problems, a neural RL approach is often able to learn a better solution than tabular RL, but generally takes longer. Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. Flight results are also presented using micro-UAS own at MIT's Real-time indoor Autonomous Vehicle test ENvironment Simulation results are provided that demonstrate the robustness of the method against an opponent beginning from both off ensive and defensive situations. An accompanying fast and e ffective rollout-based policy extraction method is used to accomplish on-line implementation. The method's success is due to extensive feature development, reward shaping and trajectory sampling. Provides a fast response to a rapidly changing tactical situation, long planning horizons, and good performance without explicit coding of air combat tactics. Optimal policy is given a slight performance advantage. In the version of the problem formulation considered, the aircraft learning the This paper presentsĪ formulation of a level flight, fixed velocity, one-on-one air combat maneuvering problem and an approximate dynamic programming (ADP) approach for computing an efficient approximation of the optimal policy. Successfully carrying out these missions autonomously. Yet, theĬomplexity of some tasks, such as air combat, have precluded UAS from Of the dangerous missions currently own by manned aircraft. Unmanned Aircraft Systems (UAS) have the potential to perform many
