Here we compare clicking consistency of a pair of annotators. We consider when both clicked v.s. only one clicked. The score is #only one clicked/#at least one clicked. A score of 0 is perfect. A score of 1 means the didn't overlap. For every image we select the best pair (lowest score). These best pairs are collected over all images and then sorted by the score. The scores are plotted below. A 7 points are uniformly sampled at 5:15:95 percentiles and the respective pair of annotations is shown.
Please e-mail us to suggest another evaluation metric.
Performance
A
B
C
D
E
F
G