Call for papers: Special Issue IEEE MultiMedia: Vision and Language Integration Meets Multimedia Fusion
Multimodal information fusion both at the signal and the semantics levels is a core part in most multimedia applications, including multimedia indexing, retrieval, summarization and others. Early or late fusion of modality-specific processing results has been addressed in multimedia prototypes since their very early days, through various methodologies including rule-based approaches, information-theoretic models and machine learning. Vision and Language are two of the predominant modalities that are being fused and which have attracted special attention in international challenges with a long history of results, such as TRECVid, ImageClef and others. During the last decade, vision-language semantic integration has attracted attention from traditionally non-interdisciplinary research communities, such as Computer Vision and Natural Language Processing. This is due to the fact that one modality can greatly assist the processing of another providing cues for disambiguation, complementary information and noise/error filtering. The latest boom of deep learning methods has opened up new directions in joint modelling of visual and co-occurring verbal information in multimedia discourse. This evolution gives the opportunity to study the concept of information fusion of language and visual data in a deep learning framework. In addition, this may require adaptations or novel approaches from a machine learning point of view. Ideally, the approaches adaptively deal with highly diverse and unstructured language and visual data that can be complementary, redundant, noisy, of differing certainty and even contradictory, possibly leading to novel fusion-based learning systems.
The topics of the special issue lie in the core of multimedia-related basic and applied research; they focus on fusion/integration of vision and language in multimedia discourse, including multisensory human-computer/human-robot interaction, as well as multimedia documents (e.g., audiovisual documents, captioned image archives, etc.).
The topics addressed are of relevance for the multimedia community as via this special issue the multimedia community will be invited to take part in the latest developments of integrated language and vision research.
We solicit papers that comprise original research, review or opinions/positions related (but not limited) to:
– Models of vision-language integration/fusion at the signal level
– Models of vision-language integration/fusion at the semantic level
– Reviews of vision-language integration/fusion models
– Reviews of vision-language integration/fusion methodologies
– Reviews of vision-language integration/fusion applications
– Position papers on challenges in vision-language integration/fusion
– Deep learning methods
– Cognitive modeling
– Distributional semantics
Integration of the models in multimedia applications:
– Multimedia search, retrieval and question answering
– Multimodal and cross-modal querying and search
– Multimedia annotation and indexing
– Multimedia recommendation
– Multimedia summarization
– Multimodal translation between language and vision
– Human-computer interaction
– Human-robot interaction
Covering different domains, e.g.,:
– Cultural heritage
– Social media
We focus both on theoretical models for integration and fusion of multimodal data and more specifically language and vision data, and on practical applications of these models.
See https://www.computer.org/web/peer-review/magazines for general author guidelines. Submissions should not exceed 6,500 words, including all text, the abstract, keywords, bibliography, biographies, and table text. Each table and figure counts for 200 words.
Manuscripts should be submitted electronically (https://mc.manuscriptcentral.com/mm-cs), selecting this special issue option.
The submission deadline is June1, 2017. The publication is planned in the IEEE MultiMedia issue of April-June 2018.
Guest editors: Marie-Francine Moens, Katerina Pastra, Kate Saenko, Tinne Tuytelaars