RoboMIND 2.0: A Multimodal, Bimanual Mobile Manipulation Dataset for Generalizable Embodied Intelligence

Hou, Wu, Liu, Che, Wu, Liao, Li, He, Feng, et al. · 2026 · arXiv preprint · arXiv:2512.24653 · PDF

Dongyu supplement. Added as a candidate missing/adjacent dataset paper after searching primary arXiv sources. This is a draft summary for triage, not a full paper read.

One-liner. A larger RoboMIND successor with 310k real-world dual-arm trajectories, tactile-enhanced episodes, mobile manipulation, and language-planner/VLA framing.

Setup

Method

Dataset plus MIND-2 hierarchy: a high-level VLM planner decomposes natural language instructions into subgoals; a low-level VLA executor generates motor actions.

Why it matters for the map

It gives a very recent large-scale example of tactile-enhanced language-conditioned manipulation data.

Limitations / open questions

Likely broad rather than focused on named dynamic state-change concepts; needs manual inspection for openness and exact annotation schema.

Source note

arXiv lines 30-39 report title, scale, tactile/mobile subsets, simulated data, and language/VLA framing.