« All Events

  • This event has passed.

Multi-view Lip-reading/Audiovisual Challenges @ ACCV 2016

20th November, 2016

It is known that human speech perception is a bimodal process that makes use of both acoustic and visual information. There is clear evidence that visual cues play an important role in automatic speech recognition either when audio is seriously corrupted by noise, through audiovisual speech recognition (AVSR) or even when it is inaccessible, through automatic lip-reading (ALR).

This workshop is aimed to challenge researchers to deal with the large variations of the speakers’ appearances caused by camera-view changes in the context of ALR/AVSR. To this end, we have collected a multi-view audiovisual database, named ‘OuluVS2’ [1], which includes 52 speakers uttering both discrete and continuous utterances, simultaneously recorded from 5 different camera views. To facilitate participants, we have preprocessed most of the data to extract the regions of interest, that is, a rectangular area including the talking mouth. The cropped mouth videos are available to researchers together with the original ones.

[1] I. Anina, Z. Zhou, G. Zhao and M. Pietikainen (2015) OuluVS2: a multi-view audiovisual database for non-rigid mouth motion analysis. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (FG), pages 1-5, Ljubljana, Slovenia.


20th November, 2016


ACCV 2016
Taiwan + Google Map