ManipArena
ManipArena (SYSU x MBZUAI) · 2026 · datasets.bot · datasets.bot page
One-liner. Real-world reasoning-oriented manipulation benchmark (CVPR 2026): 20 tasks, 10.8k trajectories, ~188h, 5 robots, LeRobot v2.1.
Setup
- Datasets / benchmarks: ManipArena is a comprehensive real-world benchmark for evaluating reasoning-oriented generalist robot manipulation, run as a CVPR 2026 Embodied AI Workshop challenge. It provides 10,812 expert trajectories (~188 hours, 13.5M frames) across 20 real tasks spanning execution-reasoning, semantic-reasoning, and long-horizon mobile manipulation, plus 3 synchronized real-to-sim tasks built via 3D scanning for fair Vision-Language-Action and world-model comparison. Demonstrations are recorded on 5 robot platforms with 3 synchronized RGB cameras (one overhead + two wrist), 56-D proprioception (joint positions/velocities/currents), gripper and mobile-base states, and three-level language annotations, in LeRobot v2.1 format. License: Apache-2.0. Download: https://huggingface.co/datasets/ManipArena/maniparena-dataset.
- Hardware / simulator: Embodiment: parallel_jaw, aloha. Environment: home, simulation, tabletop. Realness: both.
Schema
LeRobot v2.1: 3 synchronized RGB streams (overhead + 2 wrist), 56-D proprioception (joint pos/vel/currents), end-effector + gripper + mobile-base states, three-level language annotations.
Links