Teaching an RL agent to land on the Moon — from 2D prototypes to 3D physics simulations
Autonomous lunar landing is one of the hardest control problems in aerospace — the agent must manage fuel, attitude, and descent rate simultaneously while dealing with no atmosphere for aerodynamic braking. I wanted to see if a reinforcement learning agent could learn this from scratch, with no pre-programmed flight dynamics knowledge.
I started with OpenAI Gymnasium's LunarLander-v2 environment — a simplified 2D problem with discrete thrust actions. I implemented a Dueling Double Deep Q-Network (D3QN) in PyTorch that achieved a >95% landing success rate after training.
Add training curves and 2D landing footage here: public/assets/
The 2D environment was a great proof of concept, but real lunar landers operate in 3D with continuous thrust control. I built a custom rigid-body simulation in MuJoCo with accurate 1.62 m/s² lunar gravity, modeling a lander with 5 individually controllable thrusters.
The state space expanded to 13 dimensions: position (x, y, z), velocity, orientation quaternion, angular velocity, and remaining fuel mass. This was too complex for discrete actions, so I switched architectures.
I upgraded to a Deep Deterministic Policy Gradient (DDPG) agent to handle the continuous action space. The key innovation was a hybrid PD-RL attitude controller: the RL agent outputs high-level 3D thrust and torque commands, while a PD controller handles low-level attitude stabilization across the 5 thrusters.
Add 3D landing demo video here: public/assets/
The final DDPG agent achieves reliable soft landings with fuel-efficient trajectories. The hybrid controller approach was essential — pure RL struggled with attitude control, while the PD layer provided the stability the RL agent could build on top of.
This project deepened my understanding of sim-to-real transfer challenges and reinforced my interest in autonomous spacecraft systems. The next step would be domain randomization to prepare for transfer to actual flight hardware.