« All Events

  • This event has passed.

The 5th Workshop on Vision and Language

11th August, 2016 - 12th August, 2016

Research involving both language and vision computing spans a variety of disciplines and applications, and goes back a number of decades. In a recent scene shift, the big data era has thrown up a multitude of tasks in which vision and language are inherently linked. The explosive growth of visual and textual data, both online and in private repositories by diverse institutions and companies, has led to urgent requirements in terms of search, processing and
management of digital content. Solutions for providing access to or mining such data effectively depend on the connection between visual and textual content being made interpretable, hence on the ‘semantic gap’ between vision and language being bridged.

One perspective has been integrated modelling of language and vision, with approaches located at different points between the structured, cognitive modelling end of the spectrum, and the unsupervised machine learning end, with state-of-the-art results in many areas currently being produced at the latter end, in particular by deep learning approaches.

Another perspective is exploring how knowledge about language can help with predominantly visual tasks, and vice versa. Visual interpretation can be aided by text associated with images/videos and knowledge about the world learned from language. On the NLP side, images can help ground language in the physical world, allowing us to develop models for semantics. Words and pictures are often naturally linked online and in the real world, and each modality
can provide reinforcing information to aid the other.

The 5th Workshop on Vision and Language (VL’16) aims to address all the above, with a particular focus on the integrated modelling of vision and language. We welcome papers describing original research combining language and vision. To encourage the sharing of novel and emerging ideas we also welcome papers describing new data sets, grand challenges, open problems, benchmarks and work in progress as well as survey papers.

Topics of interest include (in alphabetical order), but are not limited to:
• Computational modeling of human vision and language
• Computer graphics generation from text
• Human-computer interaction in virtual worlds
• Human-robot interaction
• Image and video description and summarization
• Image and video labeling and annotation
• Image and video retrieval
• Language-driven animation
• Machine translation with visual enhancement
• Medical image processing
• Models of distributional semantics involving vision and language
• Multi-modal discourse analysis
• Multi-modal human-computer communication
• Multi-modal temporal and spatial semantics recognition and resolution
• Recognition of narratives in text and video
• Recognition of semantic roles and frames in text, images and video
• Retrieval models across different modalities
• Text-to-image generation
• Visual question answering / visual Turing challenge
• Visually grounded language understanding

Call for Papers to follow.

Important Dates:

10 January 2016: First Call for Workshop Papers
8 May 2016: Workshop Paper Due Date
5 June 2016: Notification of Acceptance
22 June 2016: Camera-ready papers due
11-12 August 2016: Workshop Dates


Anya Belz, University of Brighton, UK
Erkut Erdem, Hacettepe University, Turkey
Katerina Pastra, CSRI and ILSP Athena Research Center, Athens, Greece
Krystian Mikolajczyk, Imperial College London, UK


11th August, 2016
12th August, 2016


ACL 2016
Berlin, Germany


Anya Belz