ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image

ICCV 2025 CDEL Workshop (Oral)

University of Maryland
* Indicates equal contribution
†Dongyu is affiliated with The University of Hong Kong. The work was done during an internship at the University of Maryland.
ControlTac Teaser

At a Glance: Starting from a single reference image, ControlTac generates thousands of augmented tactile images with controllable contact forces and positions (Left). These generated images prove highly effective for downstream tasks (Middle) and demonstrate practical utility in real-world robotic experiments (Right).

Abstract

Vision-based tactile sensing is widely used in perception, reconstruction, and robotic manipulation, yet collecting large-scale tactile data remains costly due to diverse sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data—simulation and free-form tactile generation—often yield unrealistically rendered signals with poor transfer to highly dynamic real-world tasks. We propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. By grounding generation in these important physical priors, ControlTac produces realistic samples that effectively capture task-relevant variations. Across three downstream tasks and three real-world experiments, the augmented datasets using our approach consistently improve performance and demonstrate practical utility in dynamic real-world settings.

Method Overview

ControlTac consists of two key components: a. Force-Control: We input the background-removed tactile image x into the DiT model, conditioned on the 3D contact force ΔF, to generate force-specific tactile variations. b. Position-Control: We transfer the pretrained DiT from stage one and fine-tune it using ControlNet, conditioned on a contact mask c, to synthesize realistic tactile images yB under different contact positions and forces.

ControlTac Framework

Here, we demonstrate how to annotate the contact mask to represent the contact position.

Contact Mask Annotation

Visualization

Qualitative Comparison

We conduct a qualitative comparison between ControlTac and other generators and simulators. ControlTac exhibits superior realism, variation, and controllability in the generated tactile images.

Comparison with other methods

Comparison with Baseline Models

<

The first column shows 3D previews of six objects, followed by the input tactile image (Ref. Image) in the second column and the Contact Mask in the third column. The fourth column displays the initial force (top) and target force (bottom). Subsequent columns present the Ground Truth (G.T.) and results from ControlTac, the hybrid force-position conditional diffusion model (Hybrid), the separate-control pipeline (Separate), and simulation results from Taxim (Si & Yuan, 2022) . In the upper part, we visualize the generated images for comparison; in the lower part, we show the error maps highlighting differences from the ground-truth tactile image.

Baseline comparison

Force-Controlled and Position-Controlled Generation

The figure below showcases the generation results of force-controlled and position-controlled components in ControlTac.

Force and position control demonstration

Diversity of Generated Tactile Images

The figure below clearly demonstrates that ControlTac can generate a diverse range of tactile images from a single reference tactile image.

Diversity demonstration

Downstream Tasks

Force Estimation

The figure below demonstrates that ControlTac can cover the variation of positions and force, and remarkably improves MAE even with small real subsets. With only a third of the real data, the performance can reach a competitive performance to the full dataset, where the performance with only real data is much worse since it cannot cover all the variations of forces and positions. It is worthy to note that combining all real + generated data performs slightly worse than using only real data, and this is because FeelAnyForce already achieves near-oracle performance with full forces and positions coverage, although it's challenging to collect them in the real world.

Force estimation results

We further validate the effectiveness of ControlTac in real-world pushing experiments. The force estimator trained only with generated tactile data achieves comparable performance to the one trained on real tactile data, demonstrating that the generated data is realistic and reliable enough to be used directly for training in practical scenarios.

Real-world pushing experiments

Position Estimation

As shown in the table below, pose estimators trained solely on tactile images generated by ControlTac achieve strong performance across all objects, including the unseen T Shape and USB with the new sensor sample. Remarkably, using the same amount of generated data outperforms training on real data alone, even when the real dataset is relatively large, as capturing tactile data that fully covers all contact variations in the dynamic real world is extremely challenging. In such case, generated data proves particularly valuable since all the covered positions can be generated.

Furthermore, ControlTac not only outperforms simulation-based data from Taxim (Si & Yuan, 2022) , where simulated images are not realistic, but also surpasses traditional PCA-based (She et al., 2021) pose estimation methods. We also evaluate the pose estimator under varying versus fixed forces (denoted as “fixed” in Table set to the median value of 6.5 N). Results show that unfixed force improves performance since it covers the force variations in the real-world scenarios.

Pose estimation comparison

To further evaluate the performance of the pose estimator trained with ControlTac-generated data, we conducted a real-time pose tracking experiment. Our model successfully tracked poses at a frequency of 10 Hz, highlighting its practicality in dynamic real-world scenarios.

Real-time pose tracking

In the Precise Insertion task, the pose estimator trained with ControlTac-generated data achieved success rates of 90% on the cylinder and 85% on the cross. Notably, it achieved success rates of 85% on the unseen T-shape and 75% on the Type-C connector.

Precise insertion task results

Object Classification

In the object classification task, we found that compared to traditional augmentation methods, using ControlTac for data augmentation yields significantly better performance—whether with a simple CNN classifier, a ViT trained from scratch, or a ViT pretrained on ImageNet.

Note: G = geometric augmentation; C = color augmentation; Gen = our ControlTac-based augmentation method.

Object classification results

BibTeX

@article{luo2025controltac,
  title={ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image},
  author={Luo, Dongyu and Yu, Kelin and Shahidzadeh, Amir-Hossein and Fermuller, Cornelia and Aloimonos, Yiannis and Gao, Ruohan},
  journal={arXiv preprint arXiv:2505.20498},
  year={2025}
}