Taxim: An Example-based Simulation Model for GelSight Tactile Sensors

Zilin Si, Wenzhen Yuan (CMU Robotics Institute) · 2021 · IEEE RA-L 2022 (arXiv v2, Dec 2021) · arXiv · PDF

One-liner. A data-driven, CPU-real-time GelSight simulator that fits a per-pixel polynomial reflectance table from <100 real contact examples and adds the first integrated marker-motion field model, so you can synthesize realistic tactile images (both the optical geometry signal and the force/shear marker flow) without expensive physics rendering.

Problem & motivation

Mainstream robot simulators (PyBullet, MuJoCo, Isaac Gym, Drake, SOFA) model rigid/soft bodies and vision but have no native tactile sensing, yet vision-based tactile sensors like GelSight give high-resolution contact geometry and force. Simulating them is hard because it requires modeling both the mechanical response of the soft gelpad and the optical response (LED illumination + embedded camera). Prior optical sims were physics-based (Phong shading, ray tracing, TACTO's pyrender): computationally heavy, hard to migrate to a new sensor, and unable to reproduce the intrinsic noise of real sensors. No prior work simulated the marker-motion field (the shear-force signal) jointly with the optical image.

Method

Taxim has two calibrated components fed by a contact height map (collision detection → local height map → pyramid-Gaussian-kernel soft-body approximation of the gelpad).

1. Optical simulation via example-based photometric stereo. The diffuse gelpad makes reflectance spatial-invariant, so intensity is a function of local surface normal. A naive linear lookup table I = Σ_l a^l n assumes parallel uniform light, but GelSight's LEDs are close and non-uniform. Taxim instead fits a polynomial table that is also a function of image position: f_n^l(x,y) = w_n^l b, where b = [x², y², xy, x, y, 1]^T (a 2nd-order polynomial sufficed). The table is indexed by a discretized 125×125 surface-normal grid (magnitude × direction), per RGB light source, and solved by least squares from ball-indenter contacts. Per-point normals are mapped through this table to synthesize the image (Fig 2, 3).

2. Shadow simulation by superposition of "unit" shadows. Shadows from the red/green/blue LED groups are simulated by collecting a "unit" shadow mask (a single standing pin at varying depths, ~10 examples). Arbitrary geometry is approximated as side-by-side accumulated pin shadows; since beams travel independently (no inter-reflection), shadows are linearly accumulated and attached where neighbors are lower (Fig 5).

3. Marker motion field via linear displacement + superposition. Markers move because the elastomer surface stretches under normal+shear load. Taxim meshes the surface densely; nodes are active (in contact, externally loaded) or passive (only internal elastic forces). Mutual influence between nodes n_i, n_j is a 3×3 tensor T^n_i_{n_j}; any node's displacement is the superposition u_j = Σ_i T^k_i_{n_j} u^k_i (Eq 4). Because active nodes also influence each other, initial displacements are first amended to virtual displacements by inverting a matrix of inter-node tensors (Eq 5–7), then superposed. The tensor T is calibrated offline in ANSYS FEM (unit-node loads on the dense gelpad mesh, sampling the 2nd-layer mesh 0.5 mm below the surface); online sim is just matrix ops. Whole model calibrates from <100 real contacts in ~1 hour.

Setup

Datasets / benchmarks: Self-collected. Optical calibration: 50 points, 4 mm spherical indenter; shadow: 10 points, 1 mm pin indenter. Evaluation objects designed in SolidWorks and 3D-printed (10×10 or 15×15 mm bases); also Google Scanned Objects for qualitative tests. 4 different GelSight sensors + one DIGIT sensor.
Hardware / simulator: GelSight on an XYR optical stage + vertical linear stage (0.01 mm precision); dome-shaped gelpad. FEM calibration in ANSYS. Optical sim built on example-based photometric stereo; intended to bridge into robot physics engines.
Baselines: TACTO (pyrender) [Wang et al.], Phong's model [Gomes et al.], physics-based rendering [Agarwal et al.]; marker field vs. ANSYS FEM and real-sensor data.
Compute: AMD Ryzen Threadripper 2950X 16-core CPU (no GPU used). Optical sim 9.6–18.1 fps; marker field 9.22 s/sim; ANSYS FEM reference 2–4 hrs/case.

Results

Optical: lowest pixel-wise error on all four metrics vs. all three baselines (Table I), and fastest on CPU (Table II).

Method	L1 ↓	MSE ↓	SSIM ↑	PSNR ↑	fps (CPU)
TACTO	10.861	215.861	0.808	25.495	1.9
Phong's	8.163	123.249	0.832	27.763	3.8
Physics	7.409	90.623	0.759	28.687	0.1
Taxim (ours)	5.565	58.358	0.882	30.974	18.1 (9.6 w/ shadows)

Generalizes across 4 GelSight sensors and a DIGIT sensor; handles fine textures, varying indentation depth/location (MSE grows with depth and distance from center). Marker field: vs. FEM, interpolated L1 errors ~3–5×10⁻³ mm per axis. Marker-magnitude L1: 1.00×10⁻² mm (real&FEM), 1.02×10⁻² (real&ours), 3.96×10⁻³ (FEM&ours). Weighted angular L1: 12.94° (real&FEM), 14.57° (real&ours), 4.89° (FEM&ours) — i.e. Taxim tracks FEM tightly but inherits FEM's sim-to-real gap.

Limitations & open questions

(a) Author-stated. Only quasi-static contact is simulated; dynamic phenomena like slip are not modeled, and the model cannot capture partial slip under shear (a common real case). Real–sim gap attributed to: hand-manufactured gelpad not matching the ANSYS FEM model, marker-tracking noise in real data, and the no-partial-slip assumption. Polynomial table must be recalibrated per sensor (and whenever a component is replaced). GPU acceleration left for future work.

(b) What I noticed reading it. Evaluation is small and self-designed: the marker study reports a handful of load cases (0.3–0.8 mm displacement) with no variance/CI; metrics are dataset means over a modest object set, so statistical confidence is weak. Optical ground truth required manual alignment in GIMP because the real rig isn't precise enough — the reported pixel errors partly reflect alignment quality, not just simulation fidelity. The optical table is fit on spherical-indenter normals but evaluated on textured objects, so the out-of-distribution normal coverage is untested. Crucially, there is no downstream task evaluation: the paper never trains a policy or perception model on Taxim images and shows sim-to-real transfer (the headline use case for a tactile simulator) — that is deferred to "future work." The DIGIT result is qualitative only. Speed comparison is CPU-only and arguably unfair since TACTO and the physics model are GPU-accelerable.

Why I care

Off the central long-horizon-planning thesis, but squarely on the batch thesis that many manipulation predicates (is_grasped, is_inserted, surface_is_rough, is_screwed_tight) live in touch/force, not vision. Learning a tactile predicate classifier — the touch analogue of BLADE's visual predicate classifiers — needs either real tactile data at scale or a faithful simulator. Taxim is that simulator for GelSight: cheap to calibrate, CPU-real-time, and it uniquely produces the marker/shear signal that encodes contact force, which is exactly what a is_slipping or grasp_is_stable predicate would key off. It is infrastructure, not a learning method: it lets a BLADE-style pipeline generate tactile training data for predicates that are not visually evaluable. Caveat for that use: Taxim is quasi-static and can't render slip, so dynamic contact predicates would need a different generator. Pairs naturally with the simulators and datasets in this batch's Cluster I.

Quotable

To the best of our knowledge, Taxim is the first model that simulates all functions of vision-based tactile sensors, including the optical response for geometry measurement and marker motion field for force/torque measurement. — §II / p.2

The simulation model is calibrated with less than 100 data points from a real sensor. The example-based approach enables the model to easily migrate to other GelSight sensors or its variations. — Abstract / p.1

Cited here, worth ingesting next:

GelSight: High-resolution robot tactile sensors — the sensor Taxim simulates (Yuan, Dong, Adelson).
DIGIT: low-cost compact tactile sensor — Taxim is also demoed on DIGIT.

Newly ingested in 2026-06-24 batch — directly relevant:

TACTO — the pyrender physics-based GelSight/DIGIT simulator Taxim benchmarks against and beats on fidelity and CPU speed.
TacEx — GelSight simulation inside Isaac Sim; the GPU/physics-engine-integrated successor direction Taxim flagged as future work.
GelSight sensor — foundational hardware whose optical+marker signals Taxim models.
Touch and Go — real vision-touch dataset; complements simulated tactile data for representation learning.