EgoDex
Apple · 2025 · datasets.bot · datasets.bot page
One-liner. 829 hours of egocentric Apple Vision Pro video with paired 3D hand/finger tracking across 194 manipulation tasks.
Setup
- Datasets / benchmarks: EgoDex is a large-scale egocentric human manipulation dataset from Apple, comprising 829 hours of 1080p 30Hz video (~90M frames) recorded with Apple Vision Pro. It pairs each frame with 3D pose annotations for the head, upper body, and hands (68 joints) via on-device tracking, plus camera intrinsics and natural-language task descriptions. The data spans 194 diverse tabletop tasks with everyday objects (~338K episodes), intended for imitation learning of dexterous manipulation from human video. License: research-only. Download: https://ml-site.cdn-apple.com/datasets/egodex/part1.zip.
- Hardware / simulator: Embodiment: human. Environment: home, tabletop. Realness: physical.
Schema
Paired MP4 (1080p,30Hz) + HDF5: camera intrinsics, per-frame SE(3) transforms for 68 head/upper-body/hand joints, confidence scores, and language task descriptions.
Links