Data Resources Repository

Details/Description:

Verb Senses in Images (VerSe) dataset 3,518 images, each annotated with one of 90 verbs, and with the OntoNotes sense realized for a given verb in the image. Images are taken from two existing multimodal datasets (COCO and TUHOI).

Link:

https://github.com/spandanagella/verse

Details/Description:

Expansion of the Flickr30K dataset with: (i) translations of the current (in English) image descriptions into German; (ii) independently created descriptions for the same images in German.

Link:

http://www.statmt.org/wmt16/multimodal-task.html

Details/Description:

Turkish descriptions for the Flickr8k dataset

Link:

http://tasviret.cs.hacettepe.edu.tr

Details/Description:

Datasets associated with the ChaLearn challenges. These are all challenges around the notion of "Looking at people".

Link:

http://gesture.chalearn.org

Details/Description:

The Multi30K dataset extends the Flickr30K dataset with i) 31K German translations created by professional translators over a subset of the English descriptions, and ii) 155K descriptions crowdsourced independently of the original English descriptions. Paper: https://arxiv.org/abs/1605.00459

Link:

http://www.statmt.org/wmt16/multimodal-task.html

Details/Description:

The AllenAI's Charades dataset is dataset composed of 9848 videos of daily indoors activities collected through Amazon Mechanical Turk. 267 different users were presented with a sentence, that includes objects and actions from a fixed vocabulary, and they recorded a video acting out the sentence (like in a game of Charades).

Link:

http://allenai.org/plato/charades/

Details/Description:

Google Youtube-8M YouTube-8M is a large-scale labeled video dataset that consists of 8 million YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities. It also comes with precomputed state-of-the-art vision features from billions of frames, which fit on a single hard disk. This makes it possible to train video models from hundreds of thousands of video hours in less than a day on 1 GPU!

Link:

https://research.google.com/youtube8m/

Details/Description:

SotA Survey on activity recognition including generation of semantic description of videos

Link:

http://www-sop.inria.fr/members/Francois.Bremond/Postscript/iVL_ActivityRecognition_Survey.pdf

Details/Description:

Illinois image description data (Hockenmaier et al.)

Link:

http://nlp.cs.illinois.edu/HockenmaierGroup/data.html

Details/Description:

Generalized 1M image-caption corpus (Kuznetsova et al.)

Link:

http://www3.cs.stonybrook.edu/~pkuznetsova/imgcaption/