Chair of Cyber-Physical Systems

Multimodal Sensor Fusion for Quadruped Locomotion

Learning-based locomotion has achieved remarkable progress through reinforcement learning (RL), enabling legged robots to traverse complex terrains without hand-engineered controllers. However, most state-of-the-art approaches train “blind” agents via extensive domain randomization—relying solely on proprioceptive feedback (e.g., joint positions, contact sensors) to react to obstacles and uneven ground [1]. While this reactive capability is useful, it cannot anticipate upcoming terrain changes. In this project, we will augment an RL agent with visual sensory observations (e.g., depth or RGB imagery) allows it to perceive obstacles and terrain features several steps ahead, and thus to plan proactive maneuvers. By learning to combine vision and proprioception, the agent can achieve more efficient, robust, and versatile locomotion over previously unseen, challenging environments.

Main Objectives

Literature Survey
- Review recent RL-based locomotion methods that use proprioception and/or vision.
- Identify limitations of purely reactive controllers and the benefits demonstrated by vision-augmented agents.
Simulated Environment & Sensor Suite
- Build a suite of terrain scenarios in simulation (Nvidia Isaac Sim [2]) featuring obstacles, slopes, and rough surfaces.
- Integrate realistic visual sensors (depth camera and/or stereo RGB) alongside proprioceptive inputs.
Agent Architecture & Training Pipeline
- Design a neural policy that fuses visual and proprioceptive streams (e.g., convolutional encoders + recurrent modules).
- Train the agent with RL algorithms known for stability in high-dimensional observation spaces (e.g., PPO, SAC).
Evaluation & Comparative Analysis
- Evaluate proactive versus reactive policies on metrics such as success rate, energy efficiency, and gait stability both in simulation and real world (Unitree GO2 robot).
- Perform ablation studies to quantify the contribution of visual inputs under different environmental conditions (e.g., low vs. high obstacle density).

Deliverables

Simulation Testbed : A collection of parametrized terrain maps and corresponding sensor data streams for reproducible training and evaluation.
Fusion-based Locomotion Framework: Well-documented code implementing the vision-proprioception policy, training scripts, and evaluation tools.
Master’s Thesis: A comprehensive document detailing background, methodology, experiments, discussion of results, and proposed future directions.

[1] Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers rchalyang.github.io/LocoTransformer/

[2] developer.nvidia.com/isaac/sim

Requirements

Mandatory:

Strong Programming skills in Python
Proficient background in robot perception
Proficient background in reinforcement learning
Background in Linux system
Background in robot simulation, such as Issac
Familiar with Deep learning frameworks, such as Tensorflow, Pytorch

Optional:

Familiar with ROS2

Thesis Type

Masterarbeit

Contact

Hongpeng Cao

Gebäude 5501 Raum 2.106

+49 (89) 289 - 55183

cao.hongpeng@tum.de

Multimodal Sensor Fusion for Quadruped Locomotion

Requirements

Thesis Type

Contact

Mailing Address

Websites

Phone

Email