Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement

Abstract

We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill.

We find that naively reusing the same pretrained WBC can reduce robustness over long horizons, as new skills and their compositions induce shifted state and command distributions. We address this with a simple data aggregation procedure that augments shared-WBC training with rollouts from closed-loop skill execution under domain randomization.

To evaluate the approach, we introduce Humanoid Hanoi, a long-horizon Tower-of-Hanoi box rearrangement benchmark, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines.

Shared WBC Architecture

Our architecture uses a shared, task-agnostic whole-body controller (WBC) that executes all skills through a unified low-level control interface. This enables skill reuse and simplifies composition without switching control laws at skill boundaries. Independently trained high-level skills generate task-level commands that are executed through the shared WBC, which produces joint-level PD targets tracked by a low-level PD controller on the robot.

We extend this architecture with a rollout-based coverage expansion method: closed-loop executions from composed execution in simulation are aggregated to further train the shared WBC. This simple data aggregation procedure improves robustness over long horizons while preserving a single, task-agnostic whole-body controller shared across all skills.

System overview of the shared WBC architecture

Humanoid Hanoi Simulation Success

Humanoid Hanoi Simulation Failure

Hardware Pickup and Place Skills

Robust over different target heights, target positions, box masses, and box dimensions.

Humanoid Hanoi Hardware Success

Configuration 1

Configuration 2

Configuration 3

Humanoid Hanoi Hardware Failure

Perception and box state estimation error

Localization error

Physical interaction during placement

Unstable stand

BibTeX

@misc{kim2026humanoidhanoiinvestigatingshared,
      title={Humanoid Hanoi: Investigating Shared Whole-Body Control for Skill-Based Box Rearrangement}, 
      author={Minku Kim and Kuan-Chia Chen and Aayam Shrestha and Li Fuxin and Stefan Lee and Alan Fern},
      year={2026},
      eprint={2602.13850},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.13850}, 
}