Deep Q-Network for Autonomous Lunar Landing

Teaching an RL agent to land on the Moon — from 2D prototypes to 3D physics simulations

The Problem

Autonomous lunar landing is one of the hardest control problems in aerospace — the agent must manage fuel, attitude, and descent rate simultaneously while dealing with no atmosphere for aerodynamic braking. I wanted to see if a reinforcement learning agent could learn this from scratch, with no pre-programmed flight dynamics knowledge.

Phase 1: 2D Prototype

I started with OpenAI Gymnasium's LunarLander-v2 environment — a simplified 2D problem with discrete thrust actions. I implemented a Dueling Double Deep Q-Network (D3QN) in PyTorch that achieved a >95% landing success rate after training.

Key insight: The dueling architecture — splitting the Q-value into state-value and advantage streams — dramatically improved learning stability compared to vanilla DQN.

Add training curves and 2D landing footage here: public/assets/

Phase 2: Custom 3D MuJoCo Environment

The 2D environment was a great proof of concept, but real lunar landers operate in 3D with continuous thrust control. I built a custom rigid-body simulation in MuJoCo with accurate 1.62 m/s² lunar gravity, modeling a lander with 5 individually controllable thrusters.

The state space expanded to 13 dimensions: position (x, y, z), velocity, orientation quaternion, angular velocity, and remaining fuel mass. This was too complex for discrete actions, so I switched architectures.

Phase 3: DDPG with Hybrid Control

I upgraded to a Deep Deterministic Policy Gradient (DDPG) agent to handle the continuous action space. The key innovation was a hybrid PD-RL attitude controller: the RL agent outputs high-level 3D thrust and torque commands, while a PD controller handles low-level attitude stabilization across the 5 thrusters.

Training tricks that mattered:

Prioritized Experience Replay (PER) to focus on rare failure modes
Soft target updates (tau = 0.005) instead of hard copies
Gradient clipping to prevent catastrophic policy updates
Shaped reward: fuel efficiency + slow terminal descent + upright stability

Add 3D landing demo video here: public/assets/

Results & Takeaways

The final DDPG agent achieves reliable soft landings with fuel-efficient trajectories. The hybrid controller approach was essential — pure RL struggled with attitude control, while the PD layer provided the stability the RL agent could build on top of.

This project deepened my understanding of sim-to-real transfer challenges and reinforced my interest in autonomous spacecraft systems. The next step would be domain randomization to prepare for transfer to actual flight hardware.

Back to all projects