Hi Ramiro On Sat, Nov 30, 2024 at 04:23:36PM +0100, Ramiro Polla wrote: > For bit depths <= 14, the result is saturated to 15 bits. > For bit depths > 14, the result is saturated to 19 bits. > > x86_64: > chrRangeFromJpeg8_1920_c: 5827.4 5804.5 ( 1.00x) > chrRangeFromJpeg16_1920_c: 5793.2 5792.8 ( 1.00x) > chrRangeToJpeg8_1920_c: 11726.2 9388.6 ( 1.25x) > chrRangeToJpeg16_1920_c: 10610.8 5796.5 ( 1.83x) > lumRangeFromJpeg8_1920_c: 4165.7 4147.9 ( 1.00x) > lumRangeFromJpeg16_1920_c: 4530.0 4529.0 ( 1.00x) > lumRangeToJpeg8_1920_c: 6044.8 5694.1 ( 1.06x) > lumRangeToJpeg16_1920_c: 5343.6 5334.2 ( 1.00x) > > aarch64 A55: > chrRangeFromJpeg8_1920_c: 28839.3 28833.8 ( 1.00x) > chrRangeFromJpeg16_1920_c: 28843.8 28842.8 ( 1.00x) > chrRangeToJpeg8_1920_c: 44196.1 23070.6 ( 1.92x) > chrRangeToJpeg16_1920_c: 36526.7 17313.8 ( 2.11x) > lumRangeFromJpeg8_1920_c: 15384.3 15388.1 ( 1.00x) > lumRangeFromJpeg16_1920_c: 15390.1 15388.0 ( 1.00x) > lumRangeToJpeg8_1920_c: 23066.7 19226.2 ( 1.20x) > lumRangeToJpeg16_1920_c: 19224.6 19225.5 ( 1.00x) > > aarch64 A76: > chrRangeFromJpeg8_1920_c: 6316.2 6317.8 ( 1.00x) > chrRangeFromJpeg16_1920_c: 6321.9 6322.9 ( 1.00x) > chrRangeToJpeg8_1920_c: 11389.3 9287.1 ( 1.23x) > chrRangeToJpeg16_1920_c: 9514.4 6104.9 ( 1.56x) > lumRangeFromJpeg8_1920_c: 4376.0 4359.1 ( 1.00x) > lumRangeFromJpeg16_1920_c: 4437.9 4358.8 ( 1.02x) > lumRangeToJpeg8_1920_c: 6667.0 5957.2 ( 1.12x) > lumRangeToJpeg16_1920_c: 6062.5 6072.5 ( 1.00x) > > NOTE: all simd optimizations for range_convert have been disabled > except for x86, which already had the same behaviour. > they will be re-enabled when they are fixed for each architecture. > --- > libswscale/aarch64/swscale.c | 5 +++++ > libswscale/loongarch/swscale_init_loongarch.c | 5 +++++ > libswscale/riscv/swscale.c | 5 +++++ > libswscale/swscale.c | 21 ++++++++++++------- > libswscale/x86/range_convert.asm | 3 --- > 5 files changed, 29 insertions(+), 10 deletions(-)
[...] > @@ -160,8 +160,10 @@ static void chrRangeToJpeg_c(int16_t *dstU, int16_t > *dstV, int width) > { > int i; > for (i = 0; i < width; i++) { > - dstU[i] = (FFMIN(dstU[i], 30775) * 4663 - 9289992) >> 12; // -264 > - dstV[i] = (FFMIN(dstV[i], 30775) * 4663 - 9289992) >> 12; // -264 > + int U = (dstU[i] * 4663 - 9289992) >> 12; // -264 > + int V = (dstV[i] * 4663 - 9289992) >> 12; // -264 The way this is written it triggers undefined behavior if the input to teh function is too large either the input has to be ensured somewhere not to be too large (i didnt check the rest of the patchset if that is done somewhere) or the operations have to be unsigned, in case these do not occur with real input [...] > @@ -215,7 +221,8 @@ static void lumRangeToJpeg16_c(int16_t *_dst, int width) > int i; > int32_t *dst = (int32_t *) _dst; > for (i = 0; i < width; i++) { > - dst[i] = ((int)(FFMIN(dst[i], 30189 << 4) * 4769U - (39057361 << > 2))) >> 12; > + int Y = ((int)(FFMIN(dst[i], 30189 << 4) * 4769U - (39057361 << 2))) > >> 12; this still limits the input thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When the tyrant has disposed of foreign enemies by conquest or treaty, and there is nothing more to fear from them, then he is always stirring up some war or other, in order that the people may require a leader. -- Plato
signature.asc
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".