Agile But Safe: Learning Collision-Free
High-Speed Legged Locomotion


Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles.

Narrow Corridor

Indoor Hall with Furnitures

Outdoor (Grass)

Outdoor (Snow)


Collision Avoidance with Diverse Objects

Agility Test

Robustness Test

Baseline Comparison


ABS (agile policy only)

Lagrangian methods


ABS framework

  1. Training architecture: There are four trained modules within the ABS framework:
    1. Agile Policy is trained to achieve the maximum agility amidst obstacles;
    2. Reach-Avoid Value Network is trained to predict the RA values conditioned on the agile policy as safety indicators;
    3. Recovery Policy is trained to track desired twist commands (2D linear velocity and yaw angular velocity) that lower the RA values;
    4. Ray-Prediction Network is trained to predict ray distances as the policies' exteroceptive inputs given depth images.
  2. Deployment architecture: The dual policy setup switches between the agile policy and the recovery policy based on the estimated V̂ from the RA value network:
    1. If V̂ < Vthreshold, the agile policy is activated to navigate amidst obstacles;
    2. If V̂ ≥ Vthreshold, the recovery policy is activated to track twist commands that lower the RA values via constrained optimization.


  author    = {He, Tairan and Zhang, Chong and Xiao, Wenli and He, Guanqi and Liu, Changliu and Shi, Guanya},
  title     = {Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion},
  booktitle = {arXiv},
  year      = {2024},