Content found on the World Wide Web and especially in social media is increasingly composed of a mix of visual, textual and speech data, opening up many opportunities for cross-media search and web mining. This situation demands for solutions for multi-media and cross-media processing of inherently multi-modal data involving the development of technologies that combine insights from text mining, computer vision, search and retrieval and Web data processing. In addition, the processing should be made scalable and adaptable to different domains and data sources, as typically in this context we are dealing with “big data”, which is often user-generated and unstructured.
For instance, fragments of natural language, in the form of tags, captions, subtitles, surrounding text or audio, can aid the interpretation of image and video data by adding context or disambiguating visual appearance. In addition, labeled images are essential for training object or activity classifiers. On the other hand, visual data can help resolve challenges in language processing such as disambiguation of person names, places, events, etc. Studying language and vision together can also provide new insight into cognition and universal representations of knowledge and meaning to be used in a Semantic Web context. In addition, multi-modal and cross-modal search and retrieval, which combine visual and textual modalities, become increasingly popular.
The purpose of this workshop is to bring together researchers from computer vision, Web search and mining, human language technology, computational linguistics, machine learning, reasoning, information retrieval, cognitive science and application communities. The workshop will serve as a strong inter-disciplinary forum which will ignite fertilizing discussions and ideas on how to combine and integrate established techniques from different (but related) fields into new unified modeling approaches, as well as how to approach the problem of big multi-modal data present on the Web from a completely new angle. The initiative on integrating vision and text will organically yield a better understanding of the nature and usability of vast multi-modal data available online.
Submission deadline: February 22, 2015 (deadline 23:59 UTC-10).
For more information about the workshop click HERE.