http://vision.cs.uiuc.edu/phrasal/

Recognition using Visual Phrases

Ali Farhadi, Mohammad Amin Sadeghi
University of Illinois at Urbana-Champaign
CVPR'11, Best Student Paper

Abstract

In this paper we introduce visual phrases, complex visual composites like a person riding a horse. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories and demonstrate significant experimental gains resulting sulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

This dataset contains 8 object categories from Pascal VOC that are suitable for studying the interactions between objects. The dataset is formatted like Pascal VOC dataset and is easy to use. This dataset contains:

2769 images

5067 bounding-box annotations

8 objects

17 visual phrases

120 image per visual phrase

1796 bounding boxes for for visual phrases

3271 bounding boxes for objects

Objects:

person, bike, car, dog, horse, bottle, sofa, chair

Visual Phrases:

person riding horse, person sitting on sofa, person sitting on chair, person lying on sofa, person lying on beach, person riding bicycle, horse and rider jumping; person next to horse, person next to bicycle, bicycle next to car, person jumping, person next to car, dog lying on sofa, dog running; dog jumping, person running, person drinking from a bottle

Object Detection Models

We have trained models for all 17 visual phrases and 8 objects using deformable parts model (v4).

Download all models at once(13MB)

Or download them individually:

bicycle	bottle	car	chair	dog
horse	person	sofa	bicycle_nextto_car	dog_jumping
dog_lying_on_sofa	dog_running	person_drinking_bottle	person_jumping	person_jumping_on_horse
person_lying_in_beach	person_lying_on_sofa	person_nextto_bicycle	person_nextto_car	person_nextto_horse
person_riding_bicycle	person_riding_horse	person_running	person_sitting_on_chair	person_sitting_on_sofa

BibTeX

@article{VisualPhrases,
 author = {Sadeghi, Mohammad Amin and Farhadi, Ali},
 title = {Recognition using Visual Phrases}, 
 conferense = {Computer Vision and Pattern Recognition (CVPR)},
 year = {2011},
}

Recognition using Visual Phrases

Abstract

Paper

Presentation

Phrasal Recognition Dataset

Object Detection Models

BibTeX