Tuesday, October 16, 2012

SOCIS - part 10

In the previous post I mentioned that my mentor has updated the POT so that the discontinuity on the t variable is moved away from the initial -1/1 point to -sqrt(2)/2. I thus included, in the metrics computing program, a new metric to computer the number of crossings over this new discontinuity. What I noticed was that the number of discontinuities of this kind is almost the same as the number of -1/1 jumps and that average smoothing takes care of both of them. The results can be found here [1, 2].

In this new POT I incorporated average smoothing and tested to see if indeed the new results were better than the old ones. They were more or less the same. While it is not noticeable on the SNR plots below, this new implementation is slightly better (the SNR values are larger but not by much).





My latest attempt at incorporating resets into the average smoothing implementation, is showing promise. I no longer have discontinuities as in previous attempts, but the image does get blurry again. 
From the idea of deprecating previous covariance values I thought about deprecating previous average values. Firstly, why do we need the resets? If we consider an image in which the upper half is water and the lower half is land, this averaging method will specialize on smoothing the upper part and perform badly on the lower part. This is because, as the procedures progresses line by line, the average value is constructed from similar values of a certain region (in this case, the region is the upper half, which is water). But when we switch to a different region (the lower half, which is land), the average constructed from the 'water values' is no longer relevant. For this reason, previous values must be deprecated in time. 
My idea was to take 2 averages, the first one we shall name currentAverage and the second newAverage. The actual value we shall use will be a weighted sum of these two, namely (1 - w) * currentAverage + w * newAverage, where 0 <= w <= 1. If we divide the image into blocks of height NUM_LINES, then w = (currentLine mod NUM_LINES) / NUM_LINES. We can see from this that w increases as we approach the block's last line. This means that, in our formula, the bias towards currentAverage decreases and the bias towards newAverage increases. currentAverage is the average of all values from the current block and previous block, while newAverage is just the average of the values in the current block. The 'algorithm' is as follows:

for each line
    if currentLine mod NUM_LINES == 0
        switch(currentAverage, newAverage) // interchange the two averages
        reset(newAverage)
    update both currentAverage and newAverage
    average to be used = (1 - w) * currentAverage + w * newAverage.

For the image I've been using in my tests (the Mount St. Helens image), I've noticed that increasing NUM_LINES leads to better results, in the sense that we obtain smoother rotation and mean values. This makes sense for our image, as increasing NUM_LINES makes this method resemble 'traditional' average smoothing more and more. However, this image is roughly the same everywhere (in keeping with the previous analogy... we have land everywhere). The resulting images I obtained after smoothing are blurry, but discontinuity free. You can see one here [3] along with the corresponding means and rots here [4,5].

The problem of taking the sign change into account for the lossless case still remains. I'm still have implementation issues here. While my attempt seems correct to me, the results seem to disagree (discontinuities). Still in the process of understanding why this happens.

Tuesday, October 9, 2012

SOCIS - part 9

Not much progress to report this week unfortunately. College has started and this has severely affected my work. Apart from this I've hit a few road blocks.

I've tried and tried to somehow mix average smoothing and simple smoothing but I either end up with discontinuities or reducing it to just simple smoothing. The problem is that after a reset, all information about previous lines is gone and the average smoothing would have to start over as if it were a new image. I tried to replace the tCurrent values from the average computation with 0.984 * tPrev + 0.016 * tCurrent, but results are pretty much the same. I also tried the following: tNew obtained from average smoothing, if tNew differs from tPrev by more than 10%, smooth this tNew using simple smoothing (with resets after N lines of course). This has shown slightly better results but discontinuities still appear, and it's blurry, like the simple smoothing approach. We see this here:



Still exploring ways of getting a better result after taking resets into account. My mentor suggested a reset on the covariance by "deprecating" the old values to weigh less than the new one. I shall try this approach. I was also thinking about maybe saving the averages each time on disk. It doesn't add to the overall complexity, since it's a simple manner of writing a line of averages after each line in the image, but it does add to running time since IO operations are time consuming.

My mentor has updated the POT so that the discontinuity on the t variable is moved away from the initial -1/1 point to -sqrt(2)/2. This, in combination with average smoothing should yield better results, though probably not noticeable on this image.

Other than that, there have been some implementation issues with the lossless case. The idea was to add a multiplication by s (which is either -1 or 1) to the lifting network for the second component (see the article for POT, mentioned in my first post). There have been some problems with this but hopefully I will solve them and explain in more detail in my next post.

Tuesday, October 2, 2012

SOCIS - part 8

As mentioned in the previous post, I will start with the correct SNR plots comparing average smoothing with simple smoothing, original POT and the single line POT:

BIFR

Reverse waterfill

The results seem quite good. We shall see soon if this holds for other images as well.

Early on, I mentioned how the smoothing solution would have to be simple and not be susceptible to resets (which tend to happen on satellites). Average smoothing is susceptible to resets because it relies on information from all previous lines, in contrast with the simple smoothing approach which only requires information from the previous line. In terms of used memory both techniques behave similarly:

For simple POT smoothing for each tCurrent (mCurrent) I have to retain tPrev (mPrev) in order to compute tNew (mNew). So, for each line I have to store the previous line => O(z) memory.
In the averaged smoothing approach, for each tCurrent (mCurrent) I have to retain the average of previous t values (means values), which is updated at each line in constant time, in order to compute tNew (mNew). That said, for each line I will also retain a vector of averages of the previous values. => O(z) memory.
So in terms of used memory both behave the same, however, if there is a reset the simple smoothing isn't affected that much, since after the next line it will have a new vector of previous values. On the other hand, the average smoothing approach loses all information about the previous lines and has to start over from the line where the reset happened.

It would thus be ideal if we could combine the two. This is what I've been working on this week. Unfortunately it's not an easy task. After trying many variations, the problem seems to be when switching from simple smoothing to average smoothing discontinuities appear. I think this happens because average smoothing uses the averages of the tCurrent values when smoothing and this doesn't seem to 'fit' after smoothing with simple smoothing. Here is how the t parameter looks: 
In this example I used a block size of 400 lines in which I used average smoothing for the first 256 and simple smoothing for the rest. After 400 lines this procedure repeats. The discontinuities appear at around the multiples of 400, indicating that the problem appears when starting a new block. 
I will try to improve this, using something other than tCurrent in the average computation.

In parallel I will also be working on the lossless case which has been ignored so far. Everything we've talked about so far has been related to the lossy implementation. So far, I've represented the image after a simple application of lossless POT, which can be found here [1]. But more on this topic will be covered in the next blog post.

Tuesday, September 25, 2012

SOCIS - part 7

After POT is applied to an input image we obtain a transformed image in which the value of each pixel no longer corresponds to a light intensity at a specific wavelength. We can still however plot bands from the transformed image and see what they look like. To that end, I initially plotted the transformed bands for: original POT, POT with the simple smoothing formula, POT in which for each we've faked a 0 mean result. These plots can be found here [1]. Let's inspect bands 13, 19 and 37. These are the bands, I have used in all previous example images, to represent the blue, green and red wavelengths.
I will only add here the plot for band 13, in the 3 cases. The rest can be found at [1].


This band is particularly interesting because, unlike the other 2, we can clearly see discontinuities in the original POT plot. These discontinuities in the transform domain seem to correspond to the discontinuities we've seen on the image after lossy compression, such as here [2].

What's important to note is that, in the smoothed image, the discontinuities are gone. As we know, the smooth image doesn't have discontinuities either, but there's still the problem of blurriness. Where does that come from? Maybe, like in the case of the discontinuities, it is present in just one of the 3 bands chosen for red, green and blue and affects the entire image when constructing the composite image. But what can we do to solve this problem? In order to answer this, we have to plot another case.

A while back I tested the case of applying POT to the image as if it were a single line. This just means that if the image has dimensions Z, Y, X (Z = number of bands, Y = number of lines, X = number of columns), then we consider it of dimensions Z, 1, X * Y. In that case, we obtained the following:


Looks better than our blurry image, but why is that? Well, I plotted the transformed single line case [3], just like the others and compared it to the smooth case to see what was different. What I noticed was that both were free of discontinuities but there was a better contrast in the single line case (e.g. the difference in intensities between a lake and it's surroundings was greater than in the simple smoothing case). I was thinking that the mean in the single line case is for all lines. So then I thought that maybe for each line I could use a mean which is an average of all previous means. I tried that (rots were still smoothed with the original formula) and didn't get a better result. The image was still blurry. However, a plot of the means showed that they were very nicely smoothed. I then decided to try the same thing for the rots as well and it seems to have worked. The result looks very similar to the single line case:



The plots for the means and the rotations can be found here [4][5]. As can be seen, they are much smoother than anything previously obtained.
To explain in a bit more detail what I did exactly: at each step we compute a rotation parameter tCurrent and a mean mCurrent. The POT uses exactly these values which were computed. In this "averaged smoothing" the actual values that are used, let's call them tNew and mNew, are averages over all previous tCurrent or mCurrent values of the same band. Updating the average after each line can be done in constant time (keep current sum of elements and current element count), so there is no increase in either memory or time requirements.

A SNR plot will show how well this method behaves with respect to the original POT, the simple smoothing approach and the single line attempt. I made a mistake when making the SNR plots as I used the SNR values for the original POT and PSNR for the others. The next post will contain the correct plots. We can still look at one of the plots, just substitute SNR with PSNR:



We can see how the "averaged smoothing" approach performs better than both simple smoothing and single line. This PSNR plot is for reverse-waterfill rate allocation, not BIFR.

Tuesday, September 18, 2012

SOCIS - part 6

Picking up where the last post left off, I mentioned the idea of using a selective smoothing approach.

Since the variance of differences is already computed after each line (as mentioned in the previous post), this quantity can be used to implement selective smoothing. We've noticed that after smoothing an image, the variance of differences decreases. Therefore, when implementing selective smoothing I attempt to reduce the current variance. The condition for smoothing is that the relative difference between the current variance and the variance updated with |tCurrent - tPrev| be less than some threshold value.

Applying selective smoothing directly did not yield better results (discontinuities vanish, but the image is still blurry). I've proceeded to run some more tests to better determine the sources of both discontinuities and the blurriness. While discontinuities seem to originate from both t values and means, the blurriness seems to be mostly due to means.

Faking a 0 mean for each line in the transform yield this:


What's interesting to note is that even though discontinuities are present, the image is not as blurry as it is after smoothing the means as well (in this image the t values were smoothed).

To sum up, this week was mostly concerned with implementing the selective smoothing and testing various approaches to see how they affect the image. I will now try study the transformed image (right after applying POT) to see, in each band, what discontinuities appear and where in each case (without faking a 0 mean, with 0 mean).

Tuesday, September 11, 2012

SOCIS - part 5

Continuing from part 4 we discuss about smoothing and the methods employed to achieve this.

Last time we saw that certain anomalies appeared in the transformed image after the final smoothing attempt. The anomalies disappear when we do not take into account -1 to +1 (and viceversa) transitions of the t parameter and just use the simple smoothing formula tNew = 0.98 * tPrev + 0.02 * tCurrent.
The problem with this attempt is that the resulting image, while being free of discontinuities, becomes very blurry.

For my next attempt I've been testing a smoothing formula which is a weighted sum of previous t values (same for means). The weights of each previous t value grows exponentially as you approach the current line (so the previous line has the largest weight). For example: tNew = (tCurrent + 2 * t(j - 3) + 4 * t(j - 2) + 8 * t(j - 1)) / 15
This method yields slightly better results than the simple smoothing formula with just the previous t value, when the number of previous values used is 4. When I say 'slightly' I mean barely noticeable... the bifr images for 0.2 bpppb look identical, however for 0.15 bpppb this method gives a slightly better smoothing. 
You can see a comparison of the 2 cases below:

                                                        
I've compared the metrics of this case with those from the simple smoothing formula and what I've noticed is that the variances and -1 to +1 jumps are roughly the same, while metric 3 is much larger in the weighted sum implementation, so for lossy compression it doesn't have much influence. I've also tested for 2, 3 and 4 previous values and what I've noticed is that metrics 1 and 2 decrease and discontinuities disappear. However, the image becomes blurry so it's clear that this method is not good enough (not to mention the memory requirements associated with retaining previous values).


When considering how to eliminate discontinuities, the variances of differences is clearly a factor, reducing it is important. Also reducing the number of -1 to +1 (and viceversa) jumps is also important. To that end, I have incorporated an online variance computing implementation into the POT code for use in smoothing. The implementation relies on the algorithm found here [1]. 

The next thing worth exploring is perhaps a selective smoothing, which might solve both the discontinuities and the blurring problems. The idea is to smooth only component for which the t differences exceed a given threshold or change the variance significantly, while leaving the other values the way the are.


Monday, September 3, 2012

SOCIS - part 4

This post is concerned with a first attempted smoothing implementation incorporated into the POT software.

As mentioned in the beginning, the purpose of this project was to eliminate discontinuities which arose from using the pairwise orthogonal transform. In the previous post we saw how we can have strong variations of the rotation parameter. One approach at solving the discontinuity problem would be to 'smoothen' this parameter. A simple way of doing this is with the formula tNew = 0.875 * tPrev + 0.125 * tCurrent. In this formula we see that the new value for the rotation parameter is biased mostly towards the previous value thus decreasing large variations.

Let's see some comparative images. First without any smoothening:


We've seen this before. The left-most image is the original, the middle image is after applying POT and BIFR (non optimal rate allocation) and the right-most is after applying POT and waterfill. The discontinuities are clearly visible.
Now we have a look at how things improve by using the previous formula:


It's somewhat better, but the discontinuities are still very much there. "VERT" comes from vertical, because the smoothening is vertical, that is the rotation parameters are smoothened according to previous values from the same band.
Notice the variation of the t parameter after smoothing:

Clearly we can do better. For starters, let's see what happens if we increase the smoothing, that is, we change the formula so that we have more bias towards the previous value, to something like: tNew = 0.98 * tPrev + 0.02 * tCurrent.
In this case the images look like this:


Even better, but we still have discontinuities. And what about the t parameter?

We haven't taken into account something, which is the fact that the t parameter is periodic on the interval [-1,1]. This means that if we have a current t value of -0.98 and a previous value of 0.8, then when applying the formula, the actual value we should use for tCurrent should be 1.02 because of the periodicity, to obtain a final result slightly greater than 0.8. Care should be taken so that when situations like this arise the final result is within the [-1, 1] interval. If we correct the formula to take into account this periodicity we obtain the following:


And:

In the BIFR image the discontinuities are still noticeable. There is one more thing we can do and that is to smoothen the means as well. Initially the means for the image looked like this:

After we apply the same smoothing formula (no worries about periodicity this time) we get this:

And our result for the image is:

This time we can say that no discontinuities are visible in either image.

There is still one thing to be fixed in this case and that is an anomaly which can be seen in the image, discontinuities which appear in the northern part. They seem to have appeared after adapting the formula to take into account t's periodicity.


Correcting this shouldn't be a problem.

What comes next is to incorporate the metrics computations into the POT software and adjust the smoothing method accordingly. For now, the formula is fixed/constant. In terms of implementation, the correction can be easily achieved by having a class which specifically handles this sort of things and passing along a references of an instance of this class to all other methods.

Monday, August 27, 2012

SOCIS - part 3

In this third part of my log I will talk about determining metrics from images transformed with POT, metrics which will help in solving the discontinuity problem.

The main point of "stabilizing" the POT is that given its line-by-line training, variations might occur on the real applied transform that produce "discontinuities" along the vertical axis on the transformed image. Such irregularities in the shape of discontinuities are hard to code and thus the 2D coder applied thereafter doesn't work as good as it could. As we've seen in the previous post, at low bitrates these discontinuities become noticeable. They are clearly more prominent in the BIFR compressed image than in the waterfill one. It thus seems that non-optimal rate allocation might worsen the discontinuity problem.

In order to address this issue it is of course necessary to get a measure of the possible causes of this problem. One source of discontinuities could be the image itself which naturally presents such variations that are amplified by POT. The second possibility is the transform "training" process. You will remember from the previous post that I talked about side information and the t parameter used in computing the rotation matrix. This parameter takes values between -1 and +1 and depends on the correlation between bandlines. If the two components (bandlines) are loosely correlated than the t parameter can "jump" easily from -1 to +1. This is not desirable as a t parameter which continuously jumps like this will produce discontinuities in the transformed image. In the image below, we see a plot (like the one in the previous post) of this t parameter for a specific image.

Notice the variations between the 150 and 200 bandlines.
So, as mentioned in the previous post, one metric we'd like to determine is the variance of the differences of t values across different lines. Another metric would be the number of -1 to +1 (and viceversa) jumps the t parameter makes.
In the case of the lossless transform, the rotation matrix is decomposed into a product of elementary matrices, with each one being equivalent to a sequence of lifting steps. The lifting decomposition and application of KLT gives us the reversible KLT (RKLT) which guarantees the lossless nature of the transform. There are two possible lifting decompositions, depending on whether |t| >= |p| or |t| < |p|, where p = sqrt(1 - t^2). If t were to jump a lot from |t| >= |p| to |t| < |p|, or viceversa, we would again have discontinuities in the transformed image. Our third metric is thus the number of p-jumps.

I have written a C++ program which, given the side information file of an image, computes these 3 metrics. The program was run on various images in order to see how these metrics vary from image to image and determine the best approach in solving the discontinuity problem.


Tuesday, August 21, 2012

SOCIS - part 2

This is the second part of my log on the SOCIS project.

Signal-to-noise ratio and bitrate

We recall from the previous post, that compression can be either lossy or lossless. Regardless of the type of compression used, it's obvious that the compressed file has a smaller size than the original. Size, of course, is measured in units of information (bits, bytes multiples of these). A hyperspectral image is made up of bands, each band contains pixels and each pixel is a value which can be represented using a certain number of bits. The size of the image can be computed as: number of bands * number of pixels per band * number of bits per pixel. We call the last value bitrate (measured in bpppb - bits per pixel per band).
Signal-to-noise ratio (SNR) is a measure of the amount of useful information. It is measured in decibels (dB).
It is useful to plot SNR versus bitrate to see how different transforms and compression methods behave. These are called rate-distortion plots, and we can see one in the figure below.

The 3 cases which can be seen in the plot are: lossy compression using BIFR (Band-Independent Fixed Rate) with no prior transform applied to the image, lossy compression using BIFR with POT applied to the image and finally lossy compression using waterfill with POT applied to the image. Band-Independent Fixed Rate means that the bitrate is equally split among bands, while in the waterfill case the bitrate is for the entire image with all the bands.
We notice that the greatest performance comes when using POT with waterfill. This can also be seen when plotting SNR versus band number number for each bitrate. These plots can be found here [1] and for peak SNR here [2].


POT artifacts at low bitrates

As mentioned in the previous post, when using POT, at low bitrates certain artifacts can appear in the lossy compressed images because of the transform's line-based approach. The main focus of this projects is to try to reduce these artifacts.
The image I've been working with so far is a hyperspectral image of Mount St. Helens that came from the Hyperion sensor of the Earth-Observing 1 (EO1) mission. The image, along with others can be found here: [3]. In the figure below we can see this image (I've roughly extracted the bands for red, green and blue to create an RGB image):

The image has a height of 3242 pixels, a width of 256 pixels and 242 spectral bands. Now let's look at a comparison of a lossy compressed version of this image at a low bitrate of 0.2 bpppb:

The leftmost image is the original. The image in the middle is for the case in which we used POT and BIFR. We immediately notice the artifacts and poor quality. The rightmost image is from the case in which we used POT and waterfill and we see that in this case the quality is much better, yet it's the same bitrate. This is as expected from the rate-distortion plot we saw earlier. There are however artifacts in the third image as well and we can better see them if we apply a sharpening filter.


Side information

When talking about image compression, we obviously need a means to reverse the process. The images we've just seen are not "compressed files" but rather the result of decompressing the compressed file and removing the POT. How can we remove the POT? Well, if we understand what the POT does to the original image, it's only a matter of applying these steps in reverse. First each bandline is adjusted so that it has a zero-mean. That just means that we subtract the mean from each bandline. Then it applies a rotation matrix to pairs of bandlines (vectors). The rotation matrix has a specific form and depends on one parameter called t, which varies from each pair of bandlines. Thus, if we would know all of the t parameters and all of the means we could reverse the POT. Luckily these values are stored in a file called the side information file. The t parameters are stored as half floats (16 bits) and the means as shorts (16 bit integers).
It is useful to plot these values, especially the t parameter and observe the variance of it's variation across multiple lines. I have plotted the means and the t parameter for the previous image for the first 500 lines (height 500):


On the x axis we have band numbers and on the y axis image line number.

Monday, August 6, 2012

SOCIS - part 1 (introduction)

In this post I will talk about the SOCIS project I have recently been accepted to: GICI Delta's "Stabilization of an adaptive multi-line transform". The project's description is:
Stabilization of an adaptive multi-line transform
One of the transforms available in Delta is the Pairwise Orthogonal Transform (POT), which is applied independently in a line by line basis. We would like to have the possibility of introducing an smoothing factor so that two adjacent lines could be coded using a transform adapted for a particular line, but also without a step discontinuity from adjacent lines, so that the coding performance is improved. 
To elaborate on the subject, GICI stands for "Group on Interactive Coding of Images" and their main focus is the study of various image coding techniques with a particular interest for satellite image coding. In this case, we are dealing with hyperspectral images.

Hyperspectral images


As we know, normal images are represented in the RGB color model. A 24-bit Bitmap image is a m x n matrix (m is the number of lines and n is the number of columns) in which 24 bits are used to represent each pixel. Since each pixel must be a mixture of red, green and blue we will have 8 bits per color. Therefore, we are actually dealing with a m x n x 3 matrix, where 3 is the number of colors, or spectral bands.
A hyperspectral image contains a large number of spectral bands, as opposed to the familiar 3, and therefore is a m x n x z matrix, where z is the number of bands. Properties characterizing hyperspectral images are: the dimensions and number of bands, the precision used for representing values, the interleaving of bands (more information can be found here [1]) and the byte order (little endian or big endian).
Some sample images can be found here [2] and a simple Matlab script that I have written for visualizing such images can be found here [3].

Yellowstone


Compression


Since hyperspectral images can occupy a large portion of memory, a general compression scheme is needed in order to reduce their size. Compression can be either lossless (without loss of information) or lossy (with loss of information). The type of compression scheme we are interested utilizes transform coding which applies a transform to the initial data in order to obtain a better representation of the information content so that it can be easily compressed. Commonly used transforms for hyperspectral images are KLT (Karhunen-Loeve Transform) and the wavelet transform. KLT has a greater coding performance than wavelets but also has a higher computational cost, greater memory requirements, difficult implementation and lack of scalability.

Karhunen-Loeve Transform


In a nutshell, KLT is applied to a set of vectors which represent lines or band-lines from a given image. The transform is dependent on data correlation between vectors, so the first step is to compute the covariance matrix of the vectors. It is from this matrix that the transform matrix is obtained and applied to each vector. In practice, determining the covariance matrix for a set of vectors is a computationally expensive task. To address this issue, the Pairwise Orthogonal Transform (POT) was developed, which has a greater coding performance than the wavelet transform and lower computational requirements than KLT.

Pairwise Orthogonal Transform


POT, instead of computing the covariance matrix for all vectors (image components), uses a divide-et-impera approach in which the resulting transform is a composition of smaller KLT transforms applied to pairs of image components. Assuming we have n components, KLT is applied to 1 and 2, 3 and 4 etc. Each transform will result in 2 other components from which we retain only the first one. The process is repeated with these new components and so on. We immediately notice the reduced temporal complexity of the algorithm with respect to KLT. POT is applied line by line, which leads to a reduction in memory usage. In the case of lossy compression and at low bitrates, artifacts appear on images because of POT's line-based approach. The main objective of this project is to reduce these artifacts and improve coding performance by introducing a smoothing factor.