CALVIN

University of Freiburg · 2022 · datasets.bot · datasets.bot page

One-liner. Simulated benchmark for long-horizon, language-conditioned Franka Panda manipulation: 4 envs, 34 tasks, 24h of play.

Setup

Datasets / benchmarks: CALVIN (Composing Actions from Language and Vision) is an open-source PyBullet-simulated benchmark for learning long-horizon, language-conditioned continuous-control manipulation policies with a Franka Panda arm. It spans four tabletop environments (A, B, C, D), 34 manipulation tasks, and ~1000 crowd-sourced language annotations, with evaluation rollouts chaining five consecutive language-specified sub-tasks. The dataset provides ~6 hours of teleoperated 'play' data per environment (24h total) with static and gripper RGB-D cameras, tactile images, and proprioception. License: MIT. Download: https://github.com/mees/calvin/blob/main/dataset/README.md.
Hardware / simulator: Embodiment: franka_panda. Environment: simulation, tabletop. Realness: simulated.

Schema

play sequences -> per-timestep .npy dicts -> {rgb_static, rgb_gripper, depth_static, depth_gripper, tactile, robot_obs, scene_obs, actions, rel_actions}; language annotations stored separately with task IDs and precomputed embeddings

CALVIN

Setup

Schema

Links