range_convert: saturate output instead of limiting input

Michael Niedermayer Sat, 30 Nov 2024 17:39:33 -0800

Hi Ramiro

On Sat, Nov 30, 2024 at 04:23:36PM +0100, Ramiro Polla wrote:
> For bit depths <= 14, the result is saturated to 15 bits.
> For bit depths > 14, the result is saturated to 19 bits.
> 
> x86_64:
> chrRangeFromJpeg8_1920_c:    5827.4   5804.5  ( 1.00x)
> chrRangeFromJpeg16_1920_c:   5793.2   5792.8  ( 1.00x)
> chrRangeToJpeg8_1920_c:     11726.2   9388.6  ( 1.25x)
> chrRangeToJpeg16_1920_c:    10610.8   5796.5  ( 1.83x)
> lumRangeFromJpeg8_1920_c:    4165.7   4147.9  ( 1.00x)
> lumRangeFromJpeg16_1920_c:   4530.0   4529.0  ( 1.00x)
> lumRangeToJpeg8_1920_c:      6044.8   5694.1  ( 1.06x)
> lumRangeToJpeg16_1920_c:     5343.6   5334.2  ( 1.00x)
> 
> aarch64 A55:
> chrRangeFromJpeg8_1920_c:   28839.3  28833.8  ( 1.00x)
> chrRangeFromJpeg16_1920_c:  28843.8  28842.8  ( 1.00x)
> chrRangeToJpeg8_1920_c:     44196.1  23070.6  ( 1.92x)
> chrRangeToJpeg16_1920_c:    36526.7  17313.8  ( 2.11x)
> lumRangeFromJpeg8_1920_c:   15384.3  15388.1  ( 1.00x)
> lumRangeFromJpeg16_1920_c:  15390.1  15388.0  ( 1.00x)
> lumRangeToJpeg8_1920_c:     23066.7  19226.2  ( 1.20x)
> lumRangeToJpeg16_1920_c:    19224.6  19225.5  ( 1.00x)
> 
> aarch64 A76:
> chrRangeFromJpeg8_1920_c:    6316.2   6317.8  ( 1.00x)
> chrRangeFromJpeg16_1920_c:   6321.9   6322.9  ( 1.00x)
> chrRangeToJpeg8_1920_c:     11389.3   9287.1  ( 1.23x)
> chrRangeToJpeg16_1920_c:     9514.4   6104.9  ( 1.56x)
> lumRangeFromJpeg8_1920_c:    4376.0   4359.1  ( 1.00x)
> lumRangeFromJpeg16_1920_c:   4437.9   4358.8  ( 1.02x)
> lumRangeToJpeg8_1920_c:      6667.0   5957.2  ( 1.12x)
> lumRangeToJpeg16_1920_c:     6062.5   6072.5  ( 1.00x)
> 
> NOTE: all simd optimizations for range_convert have been disabled
>       except for x86, which already had the same behaviour.
>       they will be re-enabled when they are fixed for each architecture.
> ---
>  libswscale/aarch64/swscale.c                  |  5 +++++
>  libswscale/loongarch/swscale_init_loongarch.c |  5 +++++
>  libswscale/riscv/swscale.c                    |  5 +++++
>  libswscale/swscale.c                          | 21 ++++++++++++-------
>  libswscale/x86/range_convert.asm              |  3 ---
>  5 files changed, 29 insertions(+), 10 deletions(-)


[...]

> @@ -160,8 +160,10 @@ static void chrRangeToJpeg_c(int16_t *dstU, int16_t 
> *dstV, int width)
>  {
>      int i;
>      for (i = 0; i < width; i++) {
> -        dstU[i] = (FFMIN(dstU[i], 30775) * 4663 - 9289992) >> 12; // -264
> -        dstV[i] = (FFMIN(dstV[i], 30775) * 4663 - 9289992) >> 12; // -264
> +        int U = (dstU[i] * 4663 - 9289992) >> 12; // -264
> +        int V = (dstV[i] * 4663 - 9289992) >> 12; // -264

The way this is written it triggers undefined behavior if the input to teh 
function
is too large
either the input has to be ensured somewhere not to be too large (i didnt check
the rest of the patchset if that is done somewhere)
or the operations have to be unsigned, in case these do not occur with real
input


[...]
> @@ -215,7 +221,8 @@ static void lumRangeToJpeg16_c(int16_t *_dst, int width)
>      int i;
>      int32_t *dst = (int32_t *) _dst;
>      for (i = 0; i < width; i++) {

> -        dst[i] = ((int)(FFMIN(dst[i], 30189 << 4) * 4769U - (39057361 << 
> 2))) >> 12;
> +        int Y = ((int)(FFMIN(dst[i], 30189 << 4) * 4769U - (39057361 << 2))) 
> >> 12;

this still limits the input


thx

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato

signature.asc
Description: PGP signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 1/7] swscale/range_convert: saturate output instead of limiting input

Reply via email to