CS280A Project 1: Colorizing the Prokudin-Gorskii Photo Collection

by Junhua (Michael) Ma

Methods & Explanations


Overview


Step 1: Initial Processing

Input image data is read as 2D numpy array and converted to floating point data type with values ranging from 0 to 1 (intensity rescaling).

Explanations
Images with intensity scaled to always be between 0 and 1 allows the same program to work on different images that may not be ranged from 0 to 255 (e.g. images of type .tif).


Step 2: Border Detection & Cropping

Detection: Canny edge detector from SkImage library (skimage.feature.canny) is used to detect edges from gradients, which returns a 2D boolean array of same size as input image with high value true (or 1) denoting edges, and low value false (or 0) denoting no edges for each pixel. The vertical and horizontal edges are then picked out as a border by examining each row and column of the array from Canny and see if the number of pixels with high value exceeds a certain threshold.

Cropping: the outer borders are cropped by searching starting from a horizontal or vertical position close to the original edges for each of the 4 outer edges (top, bottom, left, right) and look for the first border encountered, which will be the position for the cut (as shown in left picture).

Explanations
Result of edge detection is based on the gradient and thus more accurate at picking up borders than using the original pixel values, as the concept of "blackness" may be relative instead of being a fixed range of pixel values. For cropping, the method assumes that the border edges won't be too different in size so that the search can always cover the border. This method may fail when the borders blend in with the channel images such that the edge detector fails to or incorrectly detect them.


Step 3: Channel Separation & Enhancement

Separation: using the inner borders detected by border detection in step 2, the 3 channel images are separated. The exact position of the two horizontal cuts to separate the channels are determined by starting at an estimated horizontal position and search both up and down to find closest border for the cut.

Enhancement: after the 3 channels are separated, they are all enhanced using Contrast Limited Adaptive Histogram Equalization (CLAHE) from SkImage library (skimage.exposure.equalize_adapthist), which enhances local contrast to deal with issues of over exposure in the channel images.

Explanations
From testing, CLAHE allows some hardly visible aspects of the channel images to become more visible and increases overall contrast within the image, which leads to much more vibrant and realistic coloring when combined in the final step.


Step 4: Channel Alignment

Features: features are computed for each channel image as a weighted superposition of edges and the original channel images. The edges are computed by Canny edge detector from SkImage library (skimage.feature.canny), and the resulting 2D array is weighted and added to the original channel images before all pixels are again scaled down to be between 0 and 1. The computed feature, for the most part, has clearly visible edges on top of the dimmed coloring of the original channel image.

Alignment: the alignment process tries to align the computed feature images to minimize the L2-Norm distance as the method of scoring and comparing alignment. With the blue channel fixed, the alignment is performed by trying a range of possible displacements of green and red channel feature images to find the displacement that minimizes L2-Norm when compared to the blue channel. More specifically, the channel feature images are padded and np.roll is used to perform displacements. The common area shared by the channel images are passed to the L2-Norm scoring function.

Pyramid Speedup: rescale function from SkImage library (skimage.transform.rescale) is used to scale down high resolution images, and the pyramid is created layer by layer with each layer further scaling down by a factor of 2 compared to the previous layer. The pyramid is then iterated from lowest resolution layer to the highest, where the same alignment procedure is performed at each layer and the resulting displacement from previous layers are multiplied by a factor of 2 and added with the displacement of the current layer.

Explanations
From testing, even without the use of features the alignment can perform well for almost all sample images except for emir.tif. A potential reason is that emir's clothing actually appears with quite different intensities through the three channels which results in confusions with L2-Norm scoring. Therefore, the edge feature is added to help with the alignment when the intensities appear very different across channels for the same area.

For scoring, L2-Norm is used for its overall effectiveness and efficiency. From testing, no notable improvements are found with Normalized Cross-Correlation (NCC) as scoring function. The structural similarity index is considered for its strong matching capabilities but is ultimately not used for taking too long to run.

For pyramid speed up, the lower resolutions are used to perform rough alignments, and higher resolutions are used for more fine-grained alignments. This is much faster than directly searching through the high resolution image which requires a very large search space for proper alignment while operating on a large image array.


Step 5: Final Adjustments

The RGB mappings are adjusted on the final combined output image from the previous alignment step. The green channel is scaled down the most, and the red channel is scaled down slightly.

Explanations
Even with great alignment, the overall output images tend to feel more greenish than natural. Therefore, the amount of green is cut down relative to the red and blue channels which appears to make output more clean and realistic.

Results


Low Resolution Input Images (.jpg)

The program performs well for all low resolution images, with average runtime of about 5 seconds for each image.

cathedral.jpg

offset: g (-5, 4), r (-5, 2)

monastery.jpg

offset: g (-6, 2), r (-5, 2)

tobolsk.jpg

offset: g (-3, 2), r (-4, 1)

High Resolution Input Images (.tif)

The program performs well for most high resolution images, with average runtime of about 50 seconds for each image. The weakest result is for melons.tif which may be caused by the difficulty with feature detection with many melons in the picture.

church.tif

offset: g (-42, 13), r (-42, 5)

emir.tif

offset: g (-25, 13), r (-46, 14)

harvesters.tif

offset: g (-23, 18), r (-38, 15)

icon.tif

offset: g (18, 43), r (17, 24)

lady.tif

offset: g (-66, 25), r (-150, 26)

melons.tif

offset: g (-37, 8), r (-60, 15)

onion_church.tif

offset: g (), r ()

sculpture.tif

offset: g (), r ()

self_portrait.tif

offset: g (), r ()

three_generations.tif

offset: g (), r ()

train.tif

offset: g (), r ()

Problems & Challenges


Border Detection

The border detection method used does not work perfectly for all images, and can occationally miss border edges or incorrectly mark vertical or horizontal lines as border. For most incorrectly marked borders, they are likely ignored by the cropping method which search for border only in areas very close to the outer edge of the image. In the case of missed borders, the image alignment process needs to increase the range of displacement to cover the extra displacement caused by uncropped borders and still retain good alignment outcome, which requires longer run time.


Stains Removal

I tried a few denoise and filtering methods provided by SkImage as well as writing my own methods to remove stains that disagrees across channels, but was unsuccessful so far. Fixing this issue would greatly help improve overall image quality.


Alignment by Features

I tried multiple features and was initially especially interested in utilizing corners as the feature for alignment. However, the corners detected by a few methods provided by SkImage that I tried didn't seem to provide good features for matching. Still, corners are great features that may help with alignment for some difficult cases and has some advantages over edges.