RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

Fang, Fang, Tang, Liu, Wang, Wang, Zhu, Lu · 2023 · RSS 2023 workshop / arXiv · arXiv:2307.00595 · PDF

Dongyu supplement. Added as a candidate missing/adjacent dataset paper after searching primary arXiv sources. This is a draft summary for triage, not a full paper read.

One-liner. A public contact-rich manipulation dataset with more than 110k real-world sequences, visual/force/audio/action streams, human videos, and language descriptions.

Setup

Datasets / benchmarks: Introduces RH20T: over 110,000 contact-rich robot manipulation sequences across diverse skills, contexts, robots, and viewpoints. Each sequence includes visual, force, audio, and action information, plus a human demonstration video and language description. The dataset is public.
Hardware / simulator: Multiple real-world robots and camera viewpoints; calibrated sensors for visual, force, audio, and action streams.

Method

Large-scale real-world dataset for one-shot imitation/generalization across diverse manipulation skills.

Why it matters for the map

Very relevant to the map because it has force + audio + language descriptions at scale, not only vision.

Limitations / open questions

Language descriptions may be task-level rather than fine-grained dynamic state-change annotations.

Source note

arXiv lines 38-42 report dataset size, modalities, language descriptions, and public availability.