In recent years, graph cuts have emerged as a powerful optimization technique
for minimizing energy functions that arise in low-level vision problems. Graph
cuts avoid the problems of local minima inherent in other approaches (such as
gradient descent). The goal of this thesis is to apply graph cuts to a
classical computer vision problem --- scene reconstruction from multiple
views, i.e. computing the 3-dimensional shape of the scene.
This thesis provides a technical result which greatly facilitates the
derivation of the scene reconstruction algorithm. Our result should also be
useful for developing other energy minimization algorithms based on graph
cuts. Previously such algorithms explicitly constructed graphs where a minimum
cut also minimizes the appropriate energy. It is natural to ask for what
energy functions we can construct such a graph. We answer this question for
the class of functions of binary variables that can be written as a sum of
terms containing three or fewer variables. We give a simple criterion for
functions in this class which is necessary and sufficient, as well as a
necessary condition for any function of binary variables. We also give a
general-purpose graph construction for functions that satisfy our criterion.
Using this result, we develop several algorithms for the scene reconstruction
problem. We formulate this problem in an energy minimization framework.
The energy that we minimize encodes all the constraints of the problem:
it treats the input images symmetrically, handles visibility properly,
and imposes spatial smoothness while preserving discontinuities. We
discuss several smoothness terms that we can use. Experimental results
show that our algorithms for two views are among the top performers,
and that the accuracy of the reconstruction is greatly improved when
more views are used.
In the last chapter of the thesis we address the issue of the efficiency
of min-cut/max-flow algorithms for computer vision applications.
We compare the running times of several standard
algorithms, as well as a new algorithm that we have recently developed. We
benchmark these algorithms on a number of typical graphs arising in vision.
In many cases our new algorithm works several times faster than other methods.