Connecting Touch and Vision via Cross-Modal Prediction

Yunzhu Li      Jun-Yan Zhu      Russ Tedrake      Antonio Torralba



Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.


Dataset Examples



Vision to Touch
Green: Ground Truth, Red: Prediction
[More results]


Touch to Vision
Green: Ground Truth, Red: Prediction
[More results]


Related Work

Wenzhen Yuan, Siyuan Dong, and Edward H. Adelson
GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force
Sensors, 2017

Wenzhen Yuan, Shaoxiong Wang, Siyuan Dong, and Edward H. Adelson
Connecting look and feel: Associating the visual and tactile properties of physical materials
CVPR 2017

Roberto Calandra, Andrew Owens, Manu Upadhyaya, Wenzhen Yuan, Justin Lin, Edward H. Adelson, Sergey Levine
The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?
CoRL 2017

Shaoxiong Wang*, Jiajun Wu*, Xingyuan Sun, Wenzhen Yuan, William T. Freeman, Joshua B. Tenenbaum, and Edward H. Adelson
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
IROS 2018

Subramanian Sundaram, Petr Kellnhofer, Yunzhu Li, Jun-Yan Zhu, Antonio Torralba, and Wojciech Matusik
Learning the Signatures of the Human Grasp Using a Scalable Tactile Glove
Nature, 569 (7758), 2019