DexYCB

NVIDIA (with University of Washington) · 2021 · datasets.bot · datasets.bot page

One-liner. NVIDIA RGB-D benchmark: 582K frames of 10 humans grasping 20 YCB objects, captured from 8 RealSense cameras with hand/object pose labels.

Setup

Datasets / benchmarks: DexYCB is a large-scale RGB-D dataset of 582,000 frames capturing 10 human subjects grasping 20 objects from the YCB-Video set, recorded synchronously from 8 calibrated Intel RealSense D415 cameras around a tabletop (640x480 @ 30fps, 1,000 trials total). Each frame is annotated with segmentation masks, 6D object poses, MANO hand pose, and 2D/3D hand joints. It supports tasks including 2D detection, 6D object pose estimation, 3D hand pose estimation, and a robotics-relevant safe human-to-robot handover grasp-generation task. License: CC-BY-NC-4.0. Download: https://dex-ycb.github.io/.
Hardware / simulator: Embodiment: human. Environment: lab, tabletop. Realness: physical.

Schema

per-frame label files (NPZ): {color, depth, segmentation, 6D object poses, MANO hand pose, 2D/3D hand joints}; organized by date/subject sequences across 8 camera views

DexYCB

Setup

Schema

Links