In this project we continue to work with warping images. Now,
rather than considering affine transformations of triangles, we are
considering global homographies that model changing the viewing
angle while keeping the camera in the same location.
We first capture multiple images from the same location, with the
camera pointing in different directions. These images are all
capturing the same light field, so in regions of overlap they differ only by a perspective shift, which we model as a homography
The task now is to align and join the images to create an image with a wider field of view.
Homographies and Image Rectification
Before we move to image stitching, we first look at image
rectification, which is the task of applying the homography that
rectifices a selected rectangle. We compute this homography by
solving the least squares solution to a certain linear system that
describes how points transform under the homography. The key thing
here is that we are using projective coordinates which allow us to
describe how images change as the image sensor is rotated. These
projective coordinates are very interesting and connected to some
math that I would like to learn about. The equation we start with
is
[wx_i', wy_i', w] = H[x_i, y_i, 1] (*)
This is an example of computing with equivalence classes, as there
is an entire 1d subspace of R^3 that we make correspond to a single
point in R^2. From (*) we write:
x_i' = (h_11 x_i + h_12 x_j + h_12) / (h_31 x_i + h_32 x_j + h_33)
and
x_j' = (h_21 x_i + h_22 x_j + h_22) / (h_31 x_i + h_32 x_j + h_33)
Then organizing this into a system of equations, we have:
[-x_i, -x_j, -1, 0, 0, 0, x_i' x_i, x_j' x_j, x_i'] vec(H)^T = 0
[ 0, 0, 0, -x_i, -x_j, -1, x_j' x_i, x_j' x_j, x_j'] vec(H)^T = 0
Then we stack these equations to get a system that we solve by least squares.
In the case of image rectification, we can look at the case where
we have four points (x_i, x_j), and compute the homography that
takes these points to the vertices of the unit square.
Below, we look at an example of rectifying an off center rectangle:
Now, we can look at taking three images and aligning them
homographically so that they agree on the location of the white
rectangle at then end of the aisle in the image.
After transforming images to agree on overlapping regions, we can
begin stitching the images together, using the image blending
technique we developed in project 2.
This story works for a three panel panorama, where we can clearly
align the two side images to the center image. But for the case
when there are an arbitrary number of images, this method becomes
more challenging. Rather than aligning all images to a center image
and using a notion of global coordinates, we prefer to perform the
stitching iteratively, adding one new image to the stitch at a
time.
At each new iteration, however, we consider the current stitched
image as a function from the unit square to pixel space. Thus each
new iteration introduces a new coordinate system, introducing
challenges for any global parameterization.
When using this iterative approach, we do not need to pay attention
to where the source images get mapped to in the final stitch.
Rather, we warp each image to the in progress stitched image, and
expand the bounds on the sampling region to accomodate for the
increasing size of the stitch. The new bounds are computed by
looking at where the current homography takes the corners of the
unit square. When we increase the interpolation domain to include
the entire image of the unit square under the homography, we ensure
that the resultant stitch includes the entire new image.
This can lead to problems in the case where the computed homography
has high condition number, leading to the image of the unit square
becoming elongated. As the bounds accomadate this, the majority of
the content of the stitched image gets relegated to a small region
of the final image, as shown below.
To accomadate for this, we prefer to compute homographies between
images with large overlaps and small perspective changes, ensuring
that the homographies are sufficiently gentle to the parts of the
image that are lacking point correspondence information.
Now we see the results of a couple successfully stitched images.
These stitches were made with manually selected correspondences.
I introduced various hyperparameters in the stitching code, such as
epsilon tolerances to do with masks for blending, masks for
updating pixels, all the hyperparameters associated with the
blending (such as gaussian kernel size and standard deviation).
The result is that we can achieve different blending effects
depending on the different hyperparameters that we use, which
result in different artifacts in the final image.
Now we tackle the task of automatically finding correspondences.
Previously we manually selected the points, but this was not a
robust method as it is slow and sensitive to small user errors.
We first compute the harris response to our image, by taking
horizontal and vertical derivatives, combining them to form the
second matrix, and then computing the response as the determinant
of the second moment matrix divided by its trace.
In the final implementation I used the provided code that calls the
skimage harris response function, but for illustration purposes I
show the construction process below.
Once we have the harris response, we take the harris corners to be
the points that are local maximums in their (3, 3) local
neighborhoods. This gives a set of points that is too dense to work with.
Next, we use Non-Maximal Suppression to select a subset of well
spaced points with strong harris response. We do so by first
sorting the points according to their response, and then
iteratively adding points from the list to a final subset. When we
encounter any points that are within a supperssion radius from any
points already in the final subset, we skip them. This ensures that
we take the most interesting points that are sufficiently spread
apart.
After selecting a subset of points, we treat patches around each
point as a descriptor for it. We take (40, 40) patches, then blur
and downsample to (8, 8). We perform bias / gain normalization so
that we are considering descriptions that are invariant to affine
lighting transformations.
Then we view each patch as a vector, and compute pairwise inner
products between each patch. This gives a "distance" between
points, that we now use for defining correspondences.
For each point, we look at the ratio of (the distance to its
nearest neighbor in descriptor space) and (the distance to its
second nearest neighbor). If this ratio is low, it means that the
pair of points in consideration are significantly more similar than
the next closest in similiarity, which means that we have found a
likely accurate correspondence between the points.
Now we have a decent handle on what points correspond to each
other, however there will still be points that are misidentified as
corresponding.
To account for this, we adopt the approach for computing
homgraphies. Now, the presence of bad outliers in the set of
correspondences means that a least squares estimate will be pulled
away from the correct homography. But we would ieally like to find
the "correct" homography.
To do so, we employ the RANSAC method. This is a powerful method
that exemplifies the "robustness through noise" paradigm. The idea
is to successively select 4 subsets of corrsponding points, compute
the homography that maps from one side of the correspondence to the
other, and then evaluate how well this homography explains the
transformations undergone by the rest of the points. Specifically,
we apply the proposed homography to all source points, and compute
how many get mapped epsilon close to their corresponding point.
This turns out to be an extremely effective approach, while also being beautifully simple!
Now we have automatic correspondences implemented, we can stitch together some larger panoramas.