ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image

ICCV 2025 CDEL Workshop (Oral)

University of Maryland
* Indicates equal contribution
†Dongyu is affiliated with The University of Hong Kong. The work was done during an internship at the University of Maryland.
ControlTac Teaser

At a Glance: Starting from a single reference image, ControlTac generates thousands of augmented tactile images with controllable contact forces and positions (Left). These generated images prove highly effective for downstream tasks (Middle) and demonstrate practical utility in real-world robotic experiments (Right).

Abstract

Vision-based tactile sensing is widely used in perception, reconstruction, and robotic manipulation, yet collecting large-scale tactile data remains costly due to diverse sensor-object interactions and inconsistencies across sensor instances. Existing approaches to scaling tactile data—simulation and free-form tactile generation—often yield unrealistically rendered signals with poor transfer to highly dynamic real-world tasks. We propose ControlTac, a two-stage controllable framework that generates realistic tactile images conditioned on a single reference tactile image, contact force, and contact position. By grounding generation in these important physical priors, ControlTac produces realistic samples that effectively capture task-relevant variations. Across three downstream tasks and three real-world experiments, the augmented datasets using our approach consistently improve performance and demonstrate practical utility in dynamic real-world settings.

Method Overview

ControlTac consists of two key components: (a) Force-Control Generation, where a raw tactile image $x'$ is processed by subtracting the sensor background $B$ to yield a reference image $x$, followed by a generative module that synthesizes an intermediate force-adjusted image $y_{Int}$ conditioned on a target relative 3D force $\Delta F$; and (b) Pose-Control Generation, which extracts a contact mask from the tactile sensor and utilizes a spatial conditioning network to encode this mask as a spatial prior, thereby modulating the generation process to ensure that the final output image $y$ accurately reflects both the desired force and the specific target contact pose.

ControlTac Framework

Here, we demonstrate how to annotate the contact mask to represent the contact position.

Contact Mask Annotation

Visualization

Qualitative Comparison

We conduct a qualitative comparison between ControlTac and other generators and simulators. ControlTac exhibits superior realism, variation, and controllability in the generated tactile images.

Comparison with other methods

Comparison with Baseline Models

Each object is shown with its 3D preview (col. 1), input tactile images (col. 2), and contact masks (col. 3). Col. 4 presents initial (top) and target (bottom) forces. Subsequent columns show the Ground Truth, ControlTac results, and simulator results from Taxim (Si & Yuan, 2022) , along with corresponding error maps. (a) The first six rows show seen objects under novel poses/forces from FeelAnyForce (Shahidzadeh et al., 2025) , while the last two rows show unseen objects from FeelAnyForce (Shahidzadeh et al., 2025) . (b) Shows unseen objects from AnyTouch 2 (Feng et al., 2026) . (c) Shows a failure case for an unseen objects from FeelAnyForce (Shahidzadeh et al., 2025) .

Baseline comparison

Force-Controlled and Position-Controlled Generation

The figure below showcases the generation results of force-controlled and position-controlled components in ControlTac.

Force and position control demonstration

Diversity of Generated Tactile Images

The figure below clearly demonstrates that ControlTac can generate a diverse range of tactile images from a single reference tactile image.

Diversity demonstration

Downstream Tasks

Force Estimation

The figure below demonstrates that ControlTac can cover the variation of positions and force, and remarkably improves MAE even with small real subsets. With only a third of the real data, the performance can reach a competitive performance to the full dataset, where the performance with only real data is much worse since it cannot cover all the variations of forces and positions. It is worthy to note that combining all real + generated data performs slightly worse than using only real data, and this is because FeelAnyForce already achieves near-oracle performance with full forces and positions coverage, although it's challenging to collect them in the real world.

Force estimation results

We further validate the effectiveness of ControlTac in real-world pushing experiments. The force estimator trained only with generated tactile data achieves comparable performance to the one trained on real tactile data, demonstrating that the generated data is realistic and reliable enough to be used directly for training in practical scenarios.

Real-world pushing experiments

Position Estimation

As shown in the table below, pose estimators trained solely on tactile images generated by ControlTac achieve strong performance across all objects, including the unseen T Shape and USB with the new sensor sample. Remarkably, using the same amount of generated data outperforms training on real data alone, even when the real dataset is relatively large, as capturing tactile data that fully covers all contact variations in the dynamic real world is extremely challenging. In such case, generated data proves particularly valuable since all the covered positions can be generated.

Furthermore, ControlTac not only outperforms simulation-based data from Taxim (Si & Yuan, 2022) , where simulated images are not realistic, but also surpasses traditional PCA-based (She et al., 2021) pose estimation methods. We also evaluate the pose estimator under varying versus fixed forces (denoted as “fixed” in Table set to the median value of 6.5 N). Results show that unfixed force improves performance since it covers the force variations in the real-world scenarios.

Pose estimation comparison

To further evaluate the performance of the pose estimator trained with ControlTac-generated data, we conducted a real-time pose tracking experiment. Our model successfully tracked poses at a frequency of 10 Hz, highlighting its practicality in dynamic real-world scenarios.

Real-time pose tracking

In the Precise Insertion task, the pose estimator trained with ControlTac-generated data achieved success rates of 90% on the cylinder and 85% on the cross. Notably, it achieved success rates of 85% on the unseen T-shape and 75% on the Type-C connector.

Precise insertion task results

Object Classification

In the object classification task, we found that compared to traditional augmentation methods, using ControlTac for data augmentation yields significantly better performance—whether with a simple CNN classifier, a ViT trained from scratch, or a ViT pretrained on ImageNet.

Note: G = geometric augmentation; C = color augmentation; Gen = our ControlTac-based augmentation method.

Object classification results

BibTeX

@article{luo2025controltac,
  title={ControlTac: Force- and Position-Controlled Tactile Data Augmentation with a Single Reference Image},
  author={Luo, Dongyu and Yu, Kelin and Shahidzadeh, Amir-Hossein and Fermuller, Cornelia and Aloimonos, Yiannis and Gao, Ruohan},
  journal={arXiv preprint arXiv:2505.20498},
  year={2025}
}