CS280A Project 4: Stitching and Photo Mosaics
by Junhua (Michael) Ma
Part A: Image Warping and Mosaicing
Homography & Rectification
Method & Explanations
Computing Homography: Homography is a mapping or transformation that accounts for not just translation but also other aspects like scaling and rotation. Given a set of N points in image 1 and the set of corresponding N points in image 2, where N >= 4, the homography transformation matrix can be solved from a least square system with the np.linalg.lstsq function. The least square system used is shown below with scale factor i=1 for a total of 8 unknowns a, b, c, d, e, f, g, h.
Rectification: Given an image, can specify 4 or more points to define an area A within the image. To rectify the selected area A, we can compute the homography transformation H between A and a blank rectangular canvas image C, and then warp A onto C. The resulting rectified A will appear in C. Additionally, by using the inverse homography transformation by inversing H, we can also warp any image C back onto selected area A in the original image. This allow adding new content onto the original image in a way that fits its geometry and context.
Results
Ligntning McQueen at Peterson Automotive Museum
Original Image
Two areas are defined on the image, with the outer boundary of each area defined by 4 points.
The green points define area 1.
The blue points define area 2.
Rectified (Green Area)
Rectified (Blue Area)
Inverse Transform (Green Area)
Mapped Image
Star Wars Opening Crawl Effect
Original Image
Rectified
Image Stitching & Mosaics
Method & Explanations
Points of Correspondences: Correpondences between two images are defined by at least 8 pairs of points in the two images that correspond to the same feature. Given a pair of images, correspondence points are labeled manually with a simple program created with ginput from Matplotlib library. The output is two sets of points stored in a file, with each line of the file consisting of points (x1, y1) and (x2, y2) representing the coordinates of a corresponding pair of points in image 1 and image 2.
Recovering Homography: The method of computing homography transformation matrix H between two images is the same as the previous section. Additionally, the property of matrix multiplication is used such that given images A, B, C, the homography transformation from A to C denoted as H_AC is equal to H_AB * H_BC. For my setup, given a series of images related by projective transformations, the center image IM_C is picked and all homography transformations are computed as transformations to IM_C.
Warping: The overall procedure for my warping setup is shown below.
Given a series of N images IMS = [IM_1, IM_2, ..., IM_N] related by projective transformation, with center image IM_C.
→ Step 0: Compute all homography transformations H_i from IM_i to IM_C.
→ Step 1: Determine final size of the entire mosaic image using transformed corner points of images in IMS.
→ Step 2: Create N blank images B_1, B_2, ..., B_N, one for each image in IMS.
→ Step 3: Place center image IM_C (no transform needed) onto its canvas B_C.
→ Step 4: For all IM_i != IM_C, perform inverse homography transform warping with interpolation and place result on its own canvas B_i.
→ Step 5: Blend B_1, B_2, ..., B_N one by one to obtain final mosaic.
For step 4, by computing transformed corner points of IM_i, we obtained its transformed area in IM_C's frame. We can then use sk.draw.polygon to obtain coordinates of all points in this area and then apply inverse homography transform on them to figure out what pixel in IM_i should correspond to each point in the area of IM_C's frame. Bilinear interpolation is applied since the resulting pixel coordinate in IM_i may not be valid.
Blending/Stitching: To blend the images after they are all transformed to center image's frame, a simple alpha channel feathering method is between each adjacent image pairs. Distance transform (using scipy.ndimage.distance_transform_edt) is used to compute masks that gradually shift from 1 to 0 at the edges of the transformed images. For two adjacent images A, B and mask M, the images are blended based on formular A * M + B * (1 - M).
Results
View of LA from Getty Center
Image 1
Image 2
Image 3
Correspondence Points Pair 1
Correspondence Points Pair 2
Mosaic
Xučyun Ruwway Courtyard
Image 1
Image 2
Image 3
Image 4
Image 4 (Repeated)
Image 5
Image 6
Image 7
Note: I took the above photos with my iPhone 11 by hand. I tried to make sure that the center of projection remains the same throughout and only rotate horizontally but I may have shifted it somewhat for the first 2 images due to having an unnatural hand position, the potential impact of this will be analyzed later. In addition, some camera settings may not be constant throughout all images as the iPhone camera may have some automatic adjustments applied. There is a notable brightness change between images 1-4 and images 5-7.
Mosaic (Images 3 to 5, N=3)
Mosaic (Images 2 to 6, N=5)
The above result for 5 images is not good likely due to the larger span of horizontal rotation that is now close to 180 degrees. Therefore, the full 7 images mosaic is not tested in this section, and a potential solution with cylindrical mapping is examined in the following section.
Cylindrical Mapping & Panorama
Method & Explanations
Cylindrical mapping allows the creation of panoramas, which is a wide mosaic that support up to 360 degrees full horizontal rotation. This may help
fix the issue with the previous results using planar mappings.
For cylindrical mapping, the following steps are performed:
Given an image with coordinate (u, v) to be mapped to cylindrical coordinate (xt, yt), image center (uc, vc), and focal length f:
→ Step 1: Convert (u, v) to 3D coordinates: (x, y, z) = (u - uc, v - vc, f).
→ Step 2: Map (x, y, z) onto 3D cylindrical plane: (xh, yh, zh) = (x, y, z) / sqrt(x^2 + z^2).
→ Step 3: Convert (xh, yh, zh) to 2D cylindrical coordinate: (theta, h) = (arcsin(xh), yh). This is derived from the full conversion (sin(theta), h, cos(theta)) = (xh, yh, zh).
→ Step 4: Convert cylindrical coordinate to cylindrical image coordinates: (xt, yt) = (f * theta, f * h).
→ Step 5: Shift (xt, yt) to fit the image plane, top left should be (0, 0).
The focal length f is obtained by trial and error. After all images and points of correspondences are mapped to cylindrical coordinates, the exact same setup for generating mosaics including recovering homography, warping, and stitching from the previous sections are used.
Results
Image 7 (Original)
Image 7 (Cylindrical, f=550)
Panorama (All Images 1 to 7, N=7)
The above result has a visible vertical discrepency between image 2 and 3. This may be due to a change in camera center of projection between the two images when the photos are taken, as the blend overall looks consistent otherwise.
Part B: Feature Matching & Autostitching
Extracting Feature Descriptors
Method & Explanations
Corner Feature Detection: Harris corner detector utilizes changes in gradients to pick up on features in images that are likely corner points. The result of Harries corner detection is a list of points that are likely corners, together with corner response value for each pixel of the image.
Harris Corner Response
Detected Corners (Top 10%)
Adaptive Non-Maximal Suppression: Based on the approach described in the paper “Multi-Image Matching using Multi-Scale Oriented Patches” by Brown et al , the detected corner feature
points are filtered ("suppressed") such that they become more evenly spread out, which leads to better overall feature points.
The overall idea is to filter the feature points
such that only feature points evenly spaced with higher corner response are kept, while other points are discarded. For my implementation, each corner point is assigned a radius initialized to 0 and
the radius for all point equally increase by 1 at each iteration. Whenever overlap occurs between two or more circles, only the point with largest corner response remain. This process can keep going until
a desired number of feature points is reached, during which the radius value would also be the minimum radius for the current suppression outcome.
Adaptive Non-Maximal Suppression (N=50, Radius=51)
Feature Descriptors: To better match between feature points, we extract feature descriptors, which are small patches surrounding each feature point. For my implementation, all feature descriptors are 40 by 40 patches downsampled to 8 by 8 gain-normalized patches.
Example Feature Descriptors (8 by 8)
Feature Matching & Recovering Homography
Method & Explanations
Feature Matching: To match the features of images A and B using feature descriptors DA and DB, for each feature descriptor DA_i in DA, the euclidean distance between DA_i and all of DB are computed, and the two feature descriptors from DB with smallest euclidean distance (closest to DA_i) are obtained as DB_NN1 (with smallest distance d_NN1) and DB_NN2 (with second smallest distance d_NN2), which are the 2 nearest neighbors of DA_i. Then, the Lowe threshold is applied where DA_i and DB_NN1 would form a match only if d_NN1/d_NN2 is less than a certain threshold. The overall idea is that by passing the Lowe threshold test, d_NN1 would need to be sufficiently different from d_NN2, which would lead to a more confident matching since there are no close alternatives.
Recovering Homography: After feature descriptors are matched and a list of corresponding feature points are obtained, we can apply RANSAC to recover the homography. Since the feature matching step is not perfect and could have incorrect matches, the previous homography computation setup would not work reliably. Instead, we can recover homography with RANSAC, which is robust to minor matching errors. The main steps to implement RANSAC are shown below.
Given list of corresponding feature points A and B of two images, iterate N times and for each iteration:
→ Step 1: Select 4 random pairs of feature points (Ai, Bi).
→ Step 2: Compute homography transformation matrix H_est from the 4 points selected in step 1.
→ Step 3: Transform all points A into A' using H_est.
→ Step 4: Obtain "inliers" as all pairs (Ai, Bi) such that distance between Ai' and Bi is less than certain threshold.
→ Step 5: Store inliers with maximal number of pairs.
After N iterations:
→ Step 6: Compute homography transformation matrix H based on largest inliers.
Results
Xučyun Ruwway Courtyard
Feature Matching
With Manual Correspondances
With Auto-Matching & Stitching
Xučyun Ruwway Courtyard (Cylindrical Mapping)
With Manual Correspondances
With Auto-Matching & Stitching
View of LA from Getty Center
With Manual Correspondances
With Auto-Matching & Stitching
Berkeley Street View
Feature Matching
Auto-Matching & Stitching (2 Images)
Auto-Matching & Stitching (3 Images)
Key Takeaway
Transformation matrices like homography transformation matrix are really powerful as they can derive the general spatial transformations between images from just a few correspondence points. By properties by matrix multiplications and inverses, the obtained transformations can be combined or inverted to allow interesting and cool effects that relates to the geometry properties of the image like rectification.