Monday, August 27, 2012

SOCIS - part 3

In this third part of my log I will talk about determining metrics from images transformed with POT, metrics which will help in solving the discontinuity problem.

The main point of "stabilizing" the POT is that given its line-by-line training, variations might occur on the real applied transform that produce "discontinuities" along the vertical axis on the transformed image. Such irregularities in the shape of discontinuities are hard to code and thus the 2D coder applied thereafter doesn't work as good as it could. As we've seen in the previous post, at low bitrates these discontinuities become noticeable. They are clearly more prominent in the BIFR compressed image than in the waterfill one. It thus seems that non-optimal rate allocation might worsen the discontinuity problem.

In order to address this issue it is of course necessary to get a measure of the possible causes of this problem. One source of discontinuities could be the image itself which naturally presents such variations that are amplified by POT. The second possibility is the transform "training" process. You will remember from the previous post that I talked about side information and the t parameter used in computing the rotation matrix. This parameter takes values between -1 and +1 and depends on the correlation between bandlines. If the two components (bandlines) are loosely correlated than the t parameter can "jump" easily from -1 to +1. This is not desirable as a t parameter which continuously jumps like this will produce discontinuities in the transformed image. In the image below, we see a plot (like the one in the previous post) of this t parameter for a specific image.

Notice the variations between the 150 and 200 bandlines.
So, as mentioned in the previous post, one metric we'd like to determine is the variance of the differences of t values across different lines. Another metric would be the number of -1 to +1 (and viceversa) jumps the t parameter makes.
In the case of the lossless transform, the rotation matrix is decomposed into a product of elementary matrices, with each one being equivalent to a sequence of lifting steps. The lifting decomposition and application of KLT gives us the reversible KLT (RKLT) which guarantees the lossless nature of the transform. There are two possible lifting decompositions, depending on whether |t| >= |p| or |t| < |p|, where p = sqrt(1 - t^2). If t were to jump a lot from |t| >= |p| to |t| < |p|, or viceversa, we would again have discontinuities in the transformed image. Our third metric is thus the number of p-jumps.

I have written a C++ program which, given the side information file of an image, computes these 3 metrics. The program was run on various images in order to see how these metrics vary from image to image and determine the best approach in solving the discontinuity problem.


Tuesday, August 21, 2012

SOCIS - part 2

This is the second part of my log on the SOCIS project.

Signal-to-noise ratio and bitrate

We recall from the previous post, that compression can be either lossy or lossless. Regardless of the type of compression used, it's obvious that the compressed file has a smaller size than the original. Size, of course, is measured in units of information (bits, bytes multiples of these). A hyperspectral image is made up of bands, each band contains pixels and each pixel is a value which can be represented using a certain number of bits. The size of the image can be computed as: number of bands * number of pixels per band * number of bits per pixel. We call the last value bitrate (measured in bpppb - bits per pixel per band).
Signal-to-noise ratio (SNR) is a measure of the amount of useful information. It is measured in decibels (dB).
It is useful to plot SNR versus bitrate to see how different transforms and compression methods behave. These are called rate-distortion plots, and we can see one in the figure below.

The 3 cases which can be seen in the plot are: lossy compression using BIFR (Band-Independent Fixed Rate) with no prior transform applied to the image, lossy compression using BIFR with POT applied to the image and finally lossy compression using waterfill with POT applied to the image. Band-Independent Fixed Rate means that the bitrate is equally split among bands, while in the waterfill case the bitrate is for the entire image with all the bands.
We notice that the greatest performance comes when using POT with waterfill. This can also be seen when plotting SNR versus band number number for each bitrate. These plots can be found here [1] and for peak SNR here [2].


POT artifacts at low bitrates

As mentioned in the previous post, when using POT, at low bitrates certain artifacts can appear in the lossy compressed images because of the transform's line-based approach. The main focus of this projects is to try to reduce these artifacts.
The image I've been working with so far is a hyperspectral image of Mount St. Helens that came from the Hyperion sensor of the Earth-Observing 1 (EO1) mission. The image, along with others can be found here: [3]. In the figure below we can see this image (I've roughly extracted the bands for red, green and blue to create an RGB image):

The image has a height of 3242 pixels, a width of 256 pixels and 242 spectral bands. Now let's look at a comparison of a lossy compressed version of this image at a low bitrate of 0.2 bpppb:

The leftmost image is the original. The image in the middle is for the case in which we used POT and BIFR. We immediately notice the artifacts and poor quality. The rightmost image is from the case in which we used POT and waterfill and we see that in this case the quality is much better, yet it's the same bitrate. This is as expected from the rate-distortion plot we saw earlier. There are however artifacts in the third image as well and we can better see them if we apply a sharpening filter.


Side information

When talking about image compression, we obviously need a means to reverse the process. The images we've just seen are not "compressed files" but rather the result of decompressing the compressed file and removing the POT. How can we remove the POT? Well, if we understand what the POT does to the original image, it's only a matter of applying these steps in reverse. First each bandline is adjusted so that it has a zero-mean. That just means that we subtract the mean from each bandline. Then it applies a rotation matrix to pairs of bandlines (vectors). The rotation matrix has a specific form and depends on one parameter called t, which varies from each pair of bandlines. Thus, if we would know all of the t parameters and all of the means we could reverse the POT. Luckily these values are stored in a file called the side information file. The t parameters are stored as half floats (16 bits) and the means as shorts (16 bit integers).
It is useful to plot these values, especially the t parameter and observe the variance of it's variation across multiple lines. I have plotted the means and the t parameter for the previous image for the first 500 lines (height 500):


On the x axis we have band numbers and on the y axis image line number.

Monday, August 6, 2012

SOCIS - part 1 (introduction)

In this post I will talk about the SOCIS project I have recently been accepted to: GICI Delta's "Stabilization of an adaptive multi-line transform". The project's description is:
Stabilization of an adaptive multi-line transform
One of the transforms available in Delta is the Pairwise Orthogonal Transform (POT), which is applied independently in a line by line basis. We would like to have the possibility of introducing an smoothing factor so that two adjacent lines could be coded using a transform adapted for a particular line, but also without a step discontinuity from adjacent lines, so that the coding performance is improved. 
To elaborate on the subject, GICI stands for "Group on Interactive Coding of Images" and their main focus is the study of various image coding techniques with a particular interest for satellite image coding. In this case, we are dealing with hyperspectral images.

Hyperspectral images


As we know, normal images are represented in the RGB color model. A 24-bit Bitmap image is a m x n matrix (m is the number of lines and n is the number of columns) in which 24 bits are used to represent each pixel. Since each pixel must be a mixture of red, green and blue we will have 8 bits per color. Therefore, we are actually dealing with a m x n x 3 matrix, where 3 is the number of colors, or spectral bands.
A hyperspectral image contains a large number of spectral bands, as opposed to the familiar 3, and therefore is a m x n x z matrix, where z is the number of bands. Properties characterizing hyperspectral images are: the dimensions and number of bands, the precision used for representing values, the interleaving of bands (more information can be found here [1]) and the byte order (little endian or big endian).
Some sample images can be found here [2] and a simple Matlab script that I have written for visualizing such images can be found here [3].

Yellowstone


Compression


Since hyperspectral images can occupy a large portion of memory, a general compression scheme is needed in order to reduce their size. Compression can be either lossless (without loss of information) or lossy (with loss of information). The type of compression scheme we are interested utilizes transform coding which applies a transform to the initial data in order to obtain a better representation of the information content so that it can be easily compressed. Commonly used transforms for hyperspectral images are KLT (Karhunen-Loeve Transform) and the wavelet transform. KLT has a greater coding performance than wavelets but also has a higher computational cost, greater memory requirements, difficult implementation and lack of scalability.

Karhunen-Loeve Transform


In a nutshell, KLT is applied to a set of vectors which represent lines or band-lines from a given image. The transform is dependent on data correlation between vectors, so the first step is to compute the covariance matrix of the vectors. It is from this matrix that the transform matrix is obtained and applied to each vector. In practice, determining the covariance matrix for a set of vectors is a computationally expensive task. To address this issue, the Pairwise Orthogonal Transform (POT) was developed, which has a greater coding performance than the wavelet transform and lower computational requirements than KLT.

Pairwise Orthogonal Transform


POT, instead of computing the covariance matrix for all vectors (image components), uses a divide-et-impera approach in which the resulting transform is a composition of smaller KLT transforms applied to pairs of image components. Assuming we have n components, KLT is applied to 1 and 2, 3 and 4 etc. Each transform will result in 2 other components from which we retain only the first one. The process is repeated with these new components and so on. We immediately notice the reduced temporal complexity of the algorithm with respect to KLT. POT is applied line by line, which leads to a reduction in memory usage. In the case of lossy compression and at low bitrates, artifacts appear on images because of POT's line-based approach. The main objective of this project is to reduce these artifacts and improve coding performance by introducing a smoothing factor.