"Meta says that its model, ImageBind,…

"Meta says that its model, ImageBind, is the first to combine six types of data into a single embedding space. The six types of data included in the model are: visual (in the form of both image and video); thermal (infrared images); text; audio; depth information; and — most intriguing of all — movement readings generated by an inertial measuring unit, or IMU. (IMUs are found in phones and smartwatches, where they’re used for a range of tasks, from switching a phone from landscape to portrait to distinguishing between different types of physical activity.)"

https://www.theverge.com/2023/5/9/23716558/meta-imagebind-open-source-multisensory-modal-ai-model-research