Ego-Exo4D
Meta FAIR (Project Aria) and 15+ university partners · 2023 · datasets.bot · datasets.bot page
One-liner. Meta FAIR large-scale multimodal dataset: time-synced egocentric (Aria) + exocentric (GoPro) video of skilled human activities.
Setup
- Datasets / benchmarks: Ego-Exo4D is a large-scale multimodal, multiview video dataset of skilled human activities (cooking, music, soccer, health, basketball, dance, bike repair, rock climbing) captured simultaneously from first-person Aria glasses and third-person GoPro cameras. It contains 1,286.3 hours of video from 740 camera wearers across 13 cities and 123 scene contexts, with multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions including expert commentary. It ships four benchmark tasks: keystep/fine-grained activity recognition, proficiency estimation, ego-exo relation/cross-view translation, and 3D hand/body pose estimation. License: custom. Download: https://ego4d.dev/request/ego-exo4d.
- Hardware / simulator: Embodiment: human. Environment: outdoor, kitchen. Realness: physical.
Schema
takes -> per-take {ego Aria VRS (RGB + 2 SLAM/fisheye + eye-tracking), 4-5 exo GoPro mp4, audio, gaze, IMU, 6DoF camera poses, 3D point cloud, JSON annotations}
Links