Vision-Language-Action Navigation for Humanoids

Adapting and deploying VLA models for embodied humanoid navigation.

This work explores how to adapt large-scale vision-language-action (VLA) models for real robot navigation.

Project highlights:

  • Leveraged state-of-the-art VLA policies as expert priors for humanoid navigation.
  • Improved navigation performance with RL fine-tuning and iterative policy distillation.
  • Integrated updated policies into robot software pipelines and deployed via ROS.
  • Evaluated policy robustness under changing scene geometry and task objectives.

The goal is to combine foundation-model priors with task-specific control refinement for better embodied autonomy.