JNLE Special Issue on Closing the Gap between Language and Vision

Yannick Versley Uncategorized

Journal of Natural Language Engineering, Cambridge University Press

Call for Contributions
Research involving both language and vision computing spans a large variety of disciplines and applications, and goes back at least two decades. In a recent scene shift, the big data era has thrown up a multitude of tasks in which vision and language are inherently linked. The explosive growth of visual and textual data, both online and in private repositories owned by diverse institutions and companies, has led to urgent requirements in terms of search, processing and management of digital content. Solutions for providing access to or mining such data effectively depend on the connection between visual and textual content being made interpretable, hence on the semantic gap between vision and language being bridged.

One perspective has been integrated modelling of language and vision, with approaches located at different points between the structured, cognitive modelling end of the spectrum, and the unsupervised machine learning end, with state-of-the-art results in many areas currently being produced at the latter end, in particular by deep learning approaches.

Another perspective is exploring how knowledge about language can help with predominantly visual tasks, and vice versa. Visual interpretation can be aided by text associated with images/videos and knowledge about the world learned from language. On the NLP side, images can help ground language in the physical world, allowing us to develop models for semantics. Words and pictures are often naturally linked online and in the real world, and each modality can provide reinforcing information to aid the other.

Visual recognition methods are now reaching a level of maturity where commercial deployment is becoming feasible for an increasingly wide range of applications. At the same time recent years have witnessed a marked increase in research focusing on the language and vision area, intensifying in particular in the past five years, and it can be argued that the language computing and vision computing fields for the first time overlap to form the beginnings of a genuinely interdisciplinary research field. This is the perfect moment for a special issue with an emphasis on the applications that language and vision research is now producing high-quality solutions for. A carefully chosen, representative selection of in-depth reports on the best current research in language and vision will provide the perfect snapshot of the state of the art in a field that has just experienced five extraordinarily active and productive years.


While there has been research involving both language and vision for some time, it was not until about five years ago that it began to gel into an interdisciplinary research field. Early indicators were (i) the funding of the UK EPSRC Network on Vision and Language in 2010 and of the European COST Action on Integrating Vision and Language in 2013; and (ii) the organisation of the first workshops dedicated to the topic of vision and language, including the 1st Workshop on Vision and Language in 2011 and the NIPS 2011 Workshop on Integrating Language and Vision.

Since then the subject area has grown rapidly; there has been a proliferation of language and vision workshops (there were at least seven in 2015 and 2016); and in 2015 all the major NLP conferences, ACL’15, EMNLP’15 and NAACL’15, introduced the subject area of Language and Vision for the first time. This expansion has stimulated a lot of exciting new research on a wide variety of language and vision topics, a good proportion of which is now reaching a level of maturity where journal articles are the most appropriate form of publication.

At this point in time, there is a lot of new Language and Vision research that has been mainly, if not exclusively, published in conference and workshop proceedings. The time is right for a journal special issue to provide an overview of cutting edge research in this new generation of language and vision research, through a carefully selected, representative collection of in-depth reports of the best mature research in the new interdisciplinary Language and Vision research field.

We invite the submission of contributions reporting completed research on any topics related to the above, including but not limited to the following:

  • Image and video labelling and annotation
  • Image and video description
  • Computational modelling of human vision and language
  • Image and video retrieval
  • Multimodal human-computer communication
  • Text-to-image generation
  • Language-driven animation
  • Facial animation for speech
  • Assistive methodologies


Articles should be about 20 pages long and follow the standard format and instructions for JNLE submissions, as described here: http://assets.cambridge.org/NLE/NLE_ifc.pdf

Latex style files: https://mc.manuscriptcentral.com/societyimages/nle/NLE_LaTeX_Style_File.zip

In order to submit your document please go here and select the special issue title: https://mc.manuscriptcentral.com/nle

Important Dates

First call for papers: 22 October 2016
Submission deadline: 31 January 2017
Reviews deadline: 15 March 2017
Selection of articles and submission of Special Issue to JNLE: 30 April 2017
Prospective publication of Special Issue: Autumn 2017

Guest Editors

Anja Belz, University of Brighton, UK
Tamara Berg, UNC Chapel Hill, USA
Katerina Pastra, Cognitive Systems Research Institute (CSRI), Athens, Greece

Guest Editorial Board

Yannis Aloimonos, University of Maryland, USA
Marco Baroni, Trento University, Italy
Yejin Choi, University of Washington, USA
Trevor Darrell, Berkeley, USA
Pinar Duygulu, Hacettepe University, Turkey
David Forsyth, University of Illinois at Urbana-Champaign, USA
Gregory Grefenstette, INRIA Saclay, France
Julia Hockenmaier, University of Illinois at Urbana-Champaign, USA
David Hogg, University of Leeds, UK
John Kelleher, DCU, Ireland
Frank Keller, University of Edinburgh, UK
Mirella Lapata, University of Edinburgh, UK
Krystian Mikolajzcyk, Imperial College London, UK
Margaret Mitchell, Microsoft Inc, USA
Ray Mooney, University of Texas at Austin, USA
Alan Smeaton, DCU, Ireland
Richard Socher, MetaMind Inc, USA
Tinne Tuytelaars, University of Leuven, Belgium


Email: jnle.vl.guesteditors@gmail.com