TartanAir V2
CMU AirLab · 2024 · datasets.bot · datasets.bot page
One-liner. Next-generation photorealistic synthetic SLAM and navigation dataset from CMU AirLab, spanning 65 Unreal Engine environments with multimodal sensor data (RGB, depth, segmentation, optical flow, LiDAR, IMU, event cameras) and customizable pinhole, fisheye, and equirectangular camera models.
Setup
- Datasets / benchmarks: TartanAir V2 is a large-scale photorealistic synthetic dataset built on Unreal Engine 4 with the AirSim plugin, developed by CMU's AirLab (castacks) to push the limits of visual SLAM, navigation, and robotics perception. It contains 65 highly distinct simulated environments covering urban, rural, domestic, infrastructure, thematic, and nature scenarios, both indoor and outdoor, with some environments split into sub-environments for weather/time-of-day variation. Data is collected from pre-recorded, challenging and realistic trajectories of a generic free-flying 6-DoF camera that mimic real-world robot motion (trajectories are sampled in free space, connected via RRT*, refined for loop closures, and smoothed for physically plausible motion). In each environment 12 perfectly synchronized cameras (two stereo sets of six cameras pointing in six directions to cover a 360-degree view) capture raw pinhole RGB at 640x640, 90-degree FoV, 10 Hz, with a 0.25 m stereo baseline. Raw data is processed into a rich set of modalities: stereo RGB images (plus 1000 Hz MP4 video of the left-front camera), float32 depth maps (compressed losslessly to 4-channel 8-bit PNG), category-level semantic segmentation spanning 1447 semantic classes (manually labeled, with per-environment seg_label_map.json and statistics), optical flow (CUDA-accelerated, generatable across any camera-model pair, stored as npz with covisibility/FoV masks), LiDAR point clouds sampled from depth following Velodyne VLP-16 (and VLP-32C) patterns, IMU with configurable noise modeling, event-camera data generated at 1000 Hz via the ESIM simulator with contrast thresholds 0.2-1.0, occupancy maps, and ground-truth camera poses. The accompanying tartanairpy toolkit supports sampling customizable camera models including pinhole, fisheye (doublesphere / Linear Spherical model), and equirectangular/panoramic views, plus tools for adding noise and motion blur. The dataset is downloaded and managed through the 'tartanair' Python package (pip install tartanair), with data hosted on the AirLab server and Hugging Face (theairlabcmu/tartanair, ~1.11 TB). It succeeds TartanAir V1 (IROS 2020, the official CVPR 2020 Visual SLAM Challenge dataset), adding more scenes, more modalities, and the new fisheye/panoramic camera support; it also underpins the related TartanGround ground-robot dataset. License: CC-BY-4.0. Download: https://www.tartanair.org/.
- Hardware / simulator: Embodiment: not listed. Environment: home, industrial, office, outdoor, simulation. Realness: simulated.
Schema
Per-environment trajectories with 12 synchronized cameras (two 6-camera stereo rings covering 360 degrees), 640x640 at 10 Hz, 0.25 m baseline. Per timestep: stereo RGB PNG (8-bit), float32 depth (stored as 4-channel 8-bit PNG), semantic segmentation (1447 classes via seg_label_map.json), camera poses; derived modalities: optical flow (npz with covisibility/FoV masks), LiDAR point clouds (VLP-16/VLP-32C), IMU with noise, event-camera streams from 1000 Hz MP4 via ESIM, occupancy maps; customizable pinhole/fisheye/equirectangular camera sampling.
Links