Multimodal Sensor Fusion for Quadruped Locomotion
Learning-based locomotion has achieved remarkable progress through reinforcement learning (RL), enabling legged robots to traverse complex terrains without hand-engineered controllers. However, most state-of-the-art approaches train “blind” agents via extensive domain randomization—relying solely on proprioceptive feedback (e.g., joint positions, contact sensors) to react to obstacles and uneven ground [1]. While this reactive capability is useful, it cannot anticipate upcoming terrain changes. In this project, we will augment an RL agent with visual sensory observations (e.g., depth or RGB imagery) allows it to perceive obstacles and terrain features several steps ahead, and thus to plan proactive maneuvers. By learning to combine vision and proprioception, the agent can achieve more efficient, robust, and versatile locomotion over previously unseen, challenging environments.
Main Objectives
-
Literature Survey
-
Review recent RL-based locomotion methods that use proprioception and/or vision.
-
Identify limitations of purely reactive controllers and the benefits demonstrated by vision-augmented agents.
-
-
Simulated Environment & Sensor Suite
-
Build a suite of terrain scenarios in simulation (Nvidia Isaac Sim [2]) featuring obstacles, slopes, and rough surfaces.
-
Integrate realistic visual sensors (depth camera and/or stereo RGB) alongside proprioceptive inputs.
-
-
Agent Architecture & Training Pipeline
-
Design a neural policy that fuses visual and proprioceptive streams (e.g., convolutional encoders + recurrent modules).
-
Train the agent with RL algorithms known for stability in high-dimensional observation spaces (e.g., PPO, SAC).
-
-
Evaluation & Comparative Analysis
-
Evaluate proactive versus reactive policies on metrics such as success rate, energy efficiency, and gait stability both in simulation and real world (Unitree GO2 robot).
-
Perform ablation studies to quantify the contribution of visual inputs under different environmental conditions (e.g., low vs. high obstacle density).
-
Deliverables
-
Simulation Testbed : A collection of parametrized terrain maps and corresponding sensor data streams for reproducible training and evaluation.
-
Fusion-based Locomotion Framework: Well-documented code implementing the vision-proprioception policy, training scripts, and evaluation tools.
-
Master’s Thesis: A comprehensive document detailing background, methodology, experiments, discussion of results, and proposed future directions.
[1] Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers rchalyang.github.io/LocoTransformer/
Requirements
Mandatory:
-
Strong Programming skills in Python
-
Proficient background in robot perception
-
Proficient background in reinforcement learning
-
Background in Linux system
-
Background in robot simulation, such as Issac
-
Familiar with Deep learning frameworks, such as Tensorflow, Pytorch
Optional:
-
Familiar with ROS2
Thesis Type
Masterarbeit
Contact
Gebäude 5501 Raum 2.106
+49 (89) 289 - 55183