Hey friend
Watching a humanoid robot jump over things climb stairs do backflips or get back up after being pushed never gets old.
What you see in those videos from Boston Dynamics Atlas and Unitrees latest G1 or H1 robots is mostly thanks to Reinforcement Learning (RL). Specifically RL is trained for dynamic parkour-style behaviors.
In this article we will explore how reinforcement learning is being used to teach humanoids advanced parkour skills. We will also look at what lessons we can learn from the pioneers in this space: Boston Dynamics and Unitree.
Why Parkour Is the Ultimate Test for Humanoids
Parkour pushes every limit of a humanoid robot. For example:
- It needs balance at high speeds.
- It requires precise foot placement and powerful push-off.
- It needs recovery from disturbances and failed landings.
- It needs whole-body coordination. This means arms, torso and legs work together.
- It needs to handle impacts and contact forces
If a robot can do parkour well it proves it has dynamic stability. It also shows it has control and impressive physical intelligence.
How Reinforcement Learning Powers Parkour
Unlike programming RL lets the robot learn through trial and error in simulation.
The typical training process looks like this:
- Create a simulation. This is usually done with MuJoCo or Isaac Lab.
- Define a reward function. This encourages desired behaviors. For example it could be progress staying upright minimizing energy or successful landings.
- Train for millions to billions of steps. This happens across thousands of environments.
- Use domain randomization. This means terrain, pushes, friction or mass variations. The goal is for the policy to work in the world.
- Fine-tune on the robot. This is also known as sim-to-real transfer.
Lessons from Boston Dynamics’ Atlas
Boston Dynamics has been the leader in humanoid movement for years. Their approach includes key insights:
- They use whole-body RL. This means they do not just control the legs. The policy controls the body. The arms are used for counterbalancing. The torso is used for momentum management.
- They use curriculum learning. They start with walking. Then they gradually increase difficulty. This could be obstacles, faster speeds or stronger pushes.
- They use contact- RL. Their policies are trained to handle foot-ground and body-environment contacts.
- They use a model-based and model- hybrid approach. They often combine RL with Model Predictive Control (MPC). This is for low-level stability. They use RL for high-level behavior.
Atlas’s ability to do parkour, gymnastics and rapid recovery comes from years of iterating on reward design and training strategies.
Lessons from Unitree (G1 and H1)
Unitree has taken an aggressive and cost-effective approach with their G1 and H1 robots:
- They focus on extreme sim-to-real. They put emphasis on domain randomization during training. This is so the policy transfers to the real robot with minimal fine-tuning.
- They use speed- training. Unitree robots are trained to move and recover quickly. Sometimes this is at the cost of elegance.
- They use hardware and aggressive RL. By using affordable actuators and pushing RL hard they have achieved impressive parkour performance. This is at a lower price point than Atlas.
- They use simplified yet reward design. They focus on objectives. These include velocity, torso uprightness and foot clearance. They avoid complex reward terms.
The G1’s ability to perform flips jumps and quick obstacle navigation shows how far RL has come. This is on smaller lighter platforms.
Key Challenges in RL for Humanoid Parkour
Despite the results several hard problems remain:
- Reward hacking. The robot finds undesired ways to maximize reward. For example it could crawl of walk.
- Sim-to-real gap. Behaviors that work perfectly in simulation often fail on hardware. This is due to differences in friction, latency or sensor noise.
- Safety during training. In simulation highly dynamic policies can damage the robot. This happens when deployed if not carefully validated.
- Energy efficiency. Many parkour policies are still quite power-hungry.
My Personal Take
Reinforcement learning for humanoid parkour represents one of the exciting frontiers in robotics right now. We have moved from hand-engineered walking controllers. Now robots learn dynamic movements largely by themselves in simulation.
What impresses me most is how RL is forcing us to combine the best of both worlds:
- Classical robotics knowledge. This includes ZMP, contact mechanics or optimal control.
- deep learning techniques. This includes diffusion policies, transformers or massive simulation.
Boston Dynamics shows what is possible with expertise and high-end hardware. Unitree shows how far you can go with engineering and aggressive use of simulation and RL on more affordable platforms.
I believe the next big leap will come from hybrid approaches. We will use RL for level creative behaviors. We will use controllers (MPC, Jacobian-based control), for low-level stability and safety.