We introduce the term cosegmentation which denotes the task of
segmenting simultaneously the common parts of an image pair. A
generative model for cosegmentation is presented. Inference in the
model leads to minimizing an energy with an MRF term encoding
spatial coherency and a global constraint which attempts to match
the appearance histograms of the common parts. This energy has not
been proposed previously and its optimization is challenging and
NP-hard. For this problem a novel optimization scheme which we call
trust region graph cuts is presented. We demonstrate that this
framework has the potential to improve a wide range of research:
Object driven image retrieval, video tracking and segmentation, and
interactive image editing. The power of the framework lies in its
generality, the common part can be a rigid/non-rigid object (or
scene), observed from different viewpoints or even similar objects
of the same class.