Recognition using Visual Phrases

Ali Farhadi, Mohammad Amin Sadeghi
University of Illinois at Urbana-Champaign
CVPR'11, Best Student Paper

The big picture

Abstract

In this paper we introduce visual phrases, complex visual composites like a person riding a horse. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories and demonstrate significant experimental gains resulting sulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

Paper


Download PDF (12MB)
paper

Presentation


Watch the Presentation on techtalks.tv
paper

Phrasal Recognition Dataset

Download Phrasal Recognition Dataset (250MB)
This dataset contains 8 object categories from Pascal VOC that are suitable for studying the interactions between objects. The dataset is formatted like Pascal VOC dataset and is easy to use. This dataset contains:
  • 2769 images
  • 5067 bounding-box annotations
  • 8 objects
  • 17 visual phrases
  • 120 image per visual phrase
  • 1796 bounding boxes for for visual phrases
  • 3271 bounding boxes for objects
  • Objects:
  • person, bike, car, dog, horse, bottle, sofa, chair

    paper
  • Visual Phrases:
  • person riding horse, person sitting on sofa, person sitting on chair, person lying on sofa, person lying on beach, person riding bicycle, horse and rider jumping; person next to horse, person next to bicycle, bicycle next to car, person jumping, person next to car, dog lying on sofa, dog running; dog jumping, person running, person drinking from a bottle

    Object Detection Models

    We have trained models for all 17 visual phrases and 8 objects using deformable parts model (v4).

  • Download all models at once(13MB)
  • Or download them individually:
  • paper
    bicycle
    paper
    bottle
    paper
    car
    paper
    chair
    paper
    dog
    paper
    horse
    paper
    person
    paper
    sofa
    paper
    bicycle_nextto_car
    paper
    dog_jumping
    paper
    dog_lying_on_sofa
    paper
    dog_running
    paper
    person_drinking_bottle
    paper
    person_jumping
    paper
    person_jumping_on_horse
    paper
    person_lying_in_beach
    paper
    person_lying_on_sofa
    paper
    person_nextto_bicycle
    paper
    person_nextto_car
    paper
    person_nextto_horse
    paper
    person_riding_bicycle
    paper
    person_riding_horse
    paper
    person_running
    paper
    person_sitting_on_chair
    paper
    person_sitting_on_sofa

    BibTeX

    @article{VisualPhrases,
     author = {Sadeghi, Mohammad Amin and Farhadi, Ali},
     title = {Recognition using Visual Phrases}, 
     conferense = {Computer Vision and Pattern Recognition (CVPR)},
     year = {2011},
    }