Improved chroma predictionCiscoLysakerNorwaystemidts@cisco.comThis document describes the technique used to improve the chroma
prediction in the Thor video codec.Modern video coding standards such as Thor form predictions
for the luma channel (Y) and chroma channels (U and V) which are
encoded separately (in that order). The prediction for each
channel has spatial or temporal dependencies only in its own
channel. Most of the perceived information of a video is to be
found in the luma channel, but there still remain correlations
between the luma and chroma channels. For instance, the same
shape of an object can often be seen in all three channels, and
if this correlation is not exploited, some structural
information will be transmitted three times. Thor will attempt
to improve the chroma prediction by finding linear relationships
between the each of the initial chroma predictions and the luma
prediction, and if certain criteria are satisfied, use that
relationship to form a new prediction based on the reconstructed
luma samples.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.
The improved predictions are derived from the reconstructed
luma samples using a mapping. The underlying assumption is
that the colours can be identified by their luminosities.
Informally we can say that a new chroma prediction is formed
from the reconstructed luma block painted with the colours of
the initial chroma prediction.
There is often a linear correlation between the luma and
chroma channel, so that a chroma sample c can be expressed by
the linear function
where y is the corresponding luma sample. This observation
has been previously been used in techniques to convert YUV
4:2:0 and YUV 4:2:2 images to YUV 4:4:4, and in a (rejected)
proposal for HEVC as a special intra mode. Thor, however,
generalises the prediction, so it does not depend on the
coding mode (i.e. whether inter or intra, or the kind of
inter/intra mode).
Since it would be too costly to transmit the values a and b in
the linear mapping, and since both the encoder and decoder
must be able to compute identical predictions, a and b are
derived from data available to both using linear regression.
Since the assumption that the correlation is the same in the
predicted block and in the reconstructed block is not always
true, the new prediction from luma might not be better even
when there is a very good correlation in the predicted block.
Therefore, we can only expected an improvement if the initial
prediction is bad, and the luma residual is used as an
estimate for this. The initial chroma prediction is kept
unless the average squared difference between the
reconstructed luma samples yr and the predicted y samples for
an N*N prediction block is above 64:
The encoder and decoder must compute a and b using the same
least square fit for an N*N prediction block, where y and c denote the
luma and chroma samples in the initial prediction:
These sums will all be contained within a 32 bit signed integer when the internal bitdepth is 8. Otherwise 64 bit integers must be used. Then
the following must be computed using 64 bit arithmetic regardless of bitdepth:
Still using 64 bit arithmetic, if
then it is assumed that the correlation is reasonably good and
a new prediction will be computed and used. Otherwise, the
initial prediction will be kept. First, a and b must be
computed. 2^15 is added to b to ensure correct rounding later
on.
The final operations are performed with 32 bit arithmetic, so
a must be clipped to [-2^(31-B), 2^(31-B)], where B is the
bitdepth, and b must be clipped to [-2^31, 2^31-1]. The a new
chroma prediction c' is computed using the reconstructed luma
samples yr, a and b, and a clipping function saturating the
results to an 8 bit value:
The above assumes 4:4:4 format. For the 4:2:0 format the
predicted luma block must be subsampled first:
The resulting new chroma prediction must also be subsampled. The clipping is performed before the subsampling.
In intra mode the chroma prediction improvement must be
performed right after each transform, since the new chroma
reconstruction will be used to predict the next block.
The improved chroma prediction may significantly improve the
compression efficiency for images or video containing high
correlations between the channels. It is particularly useful for
encoding screen content, 4:4:4 content, high frequency content and
"difficult" content where traditional prediction techniques perform
poorly. Little quality change is seen for content not in these
categories, but there is a general small increase in chroma PSNR.
An encoded configured for low delay and high complexity was used
for the following results. The numbers have been computed using the
Bjontegaard Delta Rate (BDR). The rates
for Y, U and V have been shown separately.
This document has no IANA considerations yet. TBDThis document has no security considerations yet. TBDThe author would like to thank Arild Fuldseth and Mo Zanaty
for reviewing this document, and Timothy Terriberry for pointing
a couple of errors in the first draft.Calculation of average PSNR differences between RD-curves