changes from v3: - removed left-over FFMIN() on input in lumRangeToJpeg16_c(); - restored cast to signed int before right shift that was mistakenly removed in chrRangeToJpeg16_c(); - restored disabling of aarch64 simd functions after changing to new API; - add test for negative input values - fixed {lum,chr}ConvertRange16 for negative input (dropped sse2 implementation since it does not have pmuldq); - reordered commits; - reran all benchmarks;
checkasm --bench for entire patchset: x86_64: chrRangeFromJpeg8_1920_c: 2126.5 2114.7 (1.01x) chrRangeFromJpeg8_1920_sse2: 817.0 814.2 (1.00x) chrRangeFromJpeg8_1920_avx2: 404.4 405.5 (1.00x) chrRangeFromJpeg16_1920_c: 2331.4 3153.9 (0.74x) chrRangeToJpeg8_1920_c: 3163.0 3163.9 (1.00x) chrRangeToJpeg8_1920_sse2: 814.5 814.8 (1.00x) chrRangeToJpeg8_1920_avx2: 404.4 405.7 (1.00x) chrRangeToJpeg16_1920_c: 3163.7 3165.0 (1.00x) lumRangeFromJpeg8_1920_c: 1262.2 1306.8 (0.97x) lumRangeFromJpeg8_1920_sse2: 411.9 414.4 (0.99x) lumRangeFromJpeg8_1920_avx2: 206.9 206.0 (1.00x) lumRangeFromJpeg16_1920_c: 1079.5 1298.5 (0.83x) lumRangeToJpeg8_1920_c: 1860.5 1906.0 (0.98x) lumRangeToJpeg8_1920_sse2: 411.9 412.9 (1.00x) lumRangeToJpeg8_1920_avx2: 198.9 205.9 (0.97x) lumRangeToJpeg16_1920_c: 1910.2 1905.0 (1.00x) aarch64 A55: chrRangeFromJpeg8_1920_c: 28836.2 28836.8 (1.00x) chrRangeFromJpeg8_1920_neon: 5312.6 5310.2 (1.00x) chrRangeFromJpeg16_1920_c: 28840.1 32684.2 (0.88x) chrRangeToJpeg8_1920_c: 44196.2 23073.2 (1.92x) chrRangeToJpeg8_1920_neon: 6034.6 5547.4 (1.09x) chrRangeToJpeg16_1920_c: 36527.3 24996.8 (1.46x) lumRangeFromJpeg8_1920_c: 15388.5 15383.5 (1.00x) lumRangeFromJpeg8_1920_neon: 3150.7 3147.4 (1.00x) lumRangeFromJpeg16_1920_c: 15389.3 17305.2 (0.89x) lumRangeToJpeg8_1920_c: 23069.7 19226.2 (1.20x) lumRangeToJpeg8_1920_neon: 3873.2 3627.8 (1.07x) lumRangeToJpeg16_1920_c: 19227.8 21144.8 (0.91x) aarch64 A76: chrRangeFromJpeg8_1920_c: 6334.7 6263.8 (1.01x) chrRangeFromJpeg8_1920_neon: 2264.5 2307.0 (0.98x) chrRangeFromJpeg16_1920_c: 6336.0 11523.8 (0.55x) chrRangeToJpeg8_1920_c: 11474.5 9610.4 (1.19x) chrRangeToJpeg8_1920_neon: 2646.5 2794.2 (0.95x) chrRangeToJpeg16_1920_c: 9640.5 11655.2 (0.83x) lumRangeFromJpeg8_1920_c: 4453.2 4420.8 (1.01x) lumRangeFromJpeg8_1920_neon: 1104.8 1107.0 (1.00x) lumRangeFromJpeg16_1920_c: 4414.2 5762.0 (0.77x) lumRangeToJpeg8_1920_c: 6645.0 5980.8 (1.11x) lumRangeToJpeg8_1920_neon: 1310.5 1334.0 (0.98x) lumRangeToJpeg16_1920_c: 6005.2 5946.2 (1.01x) Ramiro Polla (8): checkasm/sw_range_convert: test negative input values swscale/range_convert: saturate output instead of limiting input swscale/aarch64/range_convert: saturate output instead of limiting input swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats swscale/x86/range_convert: update sse2 and avx2 range_convert functions to new API swscale/aarch64/range_convert: update neon range_convert functions to new API swscale/x86: add sse4 and avx2 {lum,chr}ConvertRange16 swscale/aarch64: add neon {lum,chr}ConvertRange16 libswscale/aarch64/range_convert_neon.S | 152 ++++++++++---- libswscale/aarch64/swscale.c | 36 +++- libswscale/hscale.c | 6 +- libswscale/loongarch/swscale_init_loongarch.c | 5 + libswscale/riscv/swscale.c | 5 + libswscale/swscale.c | 122 ++++++++++-- libswscale/swscale_internal.h | 26 ++- libswscale/x86/range_convert.asm | 159 ++++++++++----- libswscale/x86/swscale.c | 50 +++-- tests/checkasm/sw_range_convert.c | 82 +++++++- .../fate/filter-alphaextract_alphamerge_rgb | 100 +++++----- tests/ref/fate/filter-pixdesc-gray10be | 2 +- tests/ref/fate/filter-pixdesc-gray10le | 2 +- tests/ref/fate/filter-pixdesc-gray12be | 2 +- tests/ref/fate/filter-pixdesc-gray12le | 2 +- tests/ref/fate/filter-pixdesc-gray14be | 2 +- tests/ref/fate/filter-pixdesc-gray14le | 2 +- tests/ref/fate/filter-pixdesc-gray16be | 2 +- tests/ref/fate/filter-pixdesc-gray16le | 2 +- tests/ref/fate/filter-pixdesc-gray9be | 2 +- tests/ref/fate/filter-pixdesc-gray9le | 2 +- tests/ref/fate/filter-pixdesc-ya16be | 2 +- tests/ref/fate/filter-pixdesc-ya16le | 2 +- tests/ref/fate/filter-pixdesc-yuvj411p | 2 +- tests/ref/fate/filter-pixdesc-yuvj420p | 2 +- tests/ref/fate/filter-pixdesc-yuvj422p | 2 +- tests/ref/fate/filter-pixdesc-yuvj440p | 2 +- tests/ref/fate/filter-pixdesc-yuvj444p | 2 +- tests/ref/fate/filter-pixfmts-copy | 34 ++-- tests/ref/fate/filter-pixfmts-crop | 34 ++-- tests/ref/fate/filter-pixfmts-field | 34 ++-- tests/ref/fate/filter-pixfmts-fieldorder | 30 +-- tests/ref/fate/filter-pixfmts-hflip | 34 ++-- tests/ref/fate/filter-pixfmts-il | 34 ++-- tests/ref/fate/filter-pixfmts-lut | 18 +- tests/ref/fate/filter-pixfmts-null | 34 ++-- tests/ref/fate/filter-pixfmts-pad | 22 +-- tests/ref/fate/filter-pixfmts-pullup | 10 +- tests/ref/fate/filter-pixfmts-rotate | 4 +- tests/ref/fate/filter-pixfmts-scale | 34 ++-- tests/ref/fate/filter-pixfmts-swapuv | 10 +- .../ref/fate/filter-pixfmts-tinterlace_cvlpf | 8 +- .../ref/fate/filter-pixfmts-tinterlace_merge | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_pad | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_vlpf | 8 +- tests/ref/fate/filter-pixfmts-transpose | 28 +-- tests/ref/fate/filter-pixfmts-vflip | 34 ++-- tests/ref/fate/fitsenc-gray | 2 +- tests/ref/fate/fitsenc-gray16be | 10 +- tests/ref/fate/gifenc-gray | 186 +++++++++--------- tests/ref/fate/idroq-video-encode | 2 +- tests/ref/fate/jpg-icc | 8 +- tests/ref/fate/sws-yuv-colorspace | 2 +- tests/ref/fate/sws-yuv-range | 2 +- tests/ref/fate/vvc-conformance-SCALING_A_1 | 128 ++++++------ tests/ref/lavf/gray16be.fits | 4 +- tests/ref/lavf/gray16be.pam | 4 +- tests/ref/lavf/gray16be.png | 6 +- tests/ref/lavf/jpg | 6 +- tests/ref/lavf/smjpeg | 6 +- tests/ref/pixfmt/gbrp-gray | 2 +- tests/ref/pixfmt/gbrp-gray10be | 2 +- tests/ref/pixfmt/gbrp-gray10le | 2 +- tests/ref/pixfmt/gbrp-gray12be | 2 +- tests/ref/pixfmt/gbrp-gray12le | 2 +- tests/ref/pixfmt/gbrp-gray16be | 2 +- tests/ref/pixfmt/gbrp-gray16le | 2 +- tests/ref/pixfmt/gbrp-yuvj420p | 2 +- tests/ref/pixfmt/gbrp-yuvj422p | 2 +- tests/ref/pixfmt/gbrp-yuvj440p | 2 +- tests/ref/pixfmt/gbrp-yuvj444p | 2 +- tests/ref/pixfmt/gbrp10-gray | 2 +- tests/ref/pixfmt/gbrp10-gray10be | 2 +- tests/ref/pixfmt/gbrp10-gray10le | 2 +- tests/ref/pixfmt/gbrp10-gray12be | 2 +- tests/ref/pixfmt/gbrp10-gray12le | 2 +- tests/ref/pixfmt/gbrp10-gray16be | 2 +- tests/ref/pixfmt/gbrp10-gray16le | 2 +- tests/ref/pixfmt/gbrp10-yuvj420p | 2 +- tests/ref/pixfmt/gbrp10-yuvj422p | 2 +- tests/ref/pixfmt/gbrp10-yuvj440p | 2 +- tests/ref/pixfmt/gbrp10-yuvj444p | 2 +- tests/ref/pixfmt/gbrp12-gray | 2 +- tests/ref/pixfmt/gbrp12-gray10be | 2 +- tests/ref/pixfmt/gbrp12-gray10le | 2 +- tests/ref/pixfmt/gbrp12-gray12be | 2 +- tests/ref/pixfmt/gbrp12-gray12le | 2 +- tests/ref/pixfmt/gbrp12-gray16be | 2 +- tests/ref/pixfmt/gbrp12-gray16le | 2 +- tests/ref/pixfmt/gbrp12-yuvj420p | 2 +- tests/ref/pixfmt/gbrp12-yuvj422p | 2 +- tests/ref/pixfmt/gbrp12-yuvj440p | 2 +- tests/ref/pixfmt/gbrp12-yuvj444p | 2 +- tests/ref/pixfmt/gbrp16-gray16be | 2 +- tests/ref/pixfmt/gbrp16-gray16le | 2 +- tests/ref/pixfmt/rgb24-gray | 2 +- tests/ref/pixfmt/rgb24-gray10be | 2 +- tests/ref/pixfmt/rgb24-gray10le | 2 +- tests/ref/pixfmt/rgb24-gray12be | 2 +- tests/ref/pixfmt/rgb24-gray12le | 2 +- tests/ref/pixfmt/rgb24-gray16be | 2 +- tests/ref/pixfmt/rgb24-gray16le | 2 +- tests/ref/pixfmt/rgb24-yuvj420p | 2 +- tests/ref/pixfmt/rgb24-yuvj422p | 2 +- tests/ref/pixfmt/rgb24-yuvj440p | 2 +- tests/ref/pixfmt/rgb24-yuvj444p | 2 +- tests/ref/pixfmt/rgb48-gray | 2 +- tests/ref/pixfmt/rgb48-gray10be | 2 +- tests/ref/pixfmt/rgb48-gray10le | 2 +- tests/ref/pixfmt/rgb48-gray12be | 2 +- tests/ref/pixfmt/rgb48-gray12le | 2 +- tests/ref/pixfmt/rgb48-gray16be | 2 +- tests/ref/pixfmt/rgb48-gray16le | 2 +- tests/ref/pixfmt/rgb48-yuvj420p | 2 +- tests/ref/pixfmt/rgb48-yuvj422p | 2 +- tests/ref/pixfmt/rgb48-yuvj440p | 2 +- tests/ref/pixfmt/rgb48-yuvj444p | 2 +- tests/ref/pixfmt/yuv444p-gray10be | 2 +- tests/ref/pixfmt/yuv444p-gray10le | 2 +- tests/ref/pixfmt/yuv444p-gray12be | 2 +- tests/ref/pixfmt/yuv444p-gray12le | 2 +- tests/ref/pixfmt/yuv444p-gray16be | 2 +- tests/ref/pixfmt/yuv444p-gray16le | 2 +- tests/ref/pixfmt/yuv444p-yuvj420p | 2 +- tests/ref/pixfmt/yuv444p-yuvj422p | 2 +- tests/ref/pixfmt/yuv444p-yuvj440p | 2 +- tests/ref/pixfmt/yuv444p10-gray | 2 +- tests/ref/pixfmt/yuv444p10-gray10be | 2 +- tests/ref/pixfmt/yuv444p10-gray10le | 2 +- tests/ref/pixfmt/yuv444p10-gray12be | 2 +- tests/ref/pixfmt/yuv444p10-gray12le | 2 +- tests/ref/pixfmt/yuv444p10-gray16be | 2 +- tests/ref/pixfmt/yuv444p10-gray16le | 2 +- tests/ref/pixfmt/yuv444p10-yuvj420p | 2 +- tests/ref/pixfmt/yuv444p10-yuvj422p | 2 +- tests/ref/pixfmt/yuv444p10-yuvj440p | 2 +- tests/ref/pixfmt/yuv444p10-yuvj444p | 2 +- tests/ref/pixfmt/yuv444p12-gray | 2 +- tests/ref/pixfmt/yuv444p12-gray10be | 2 +- tests/ref/pixfmt/yuv444p12-gray10le | 2 +- tests/ref/pixfmt/yuv444p12-gray12be | 2 +- tests/ref/pixfmt/yuv444p12-gray12le | 2 +- tests/ref/pixfmt/yuv444p12-gray16be | 2 +- tests/ref/pixfmt/yuv444p12-gray16le | 2 +- tests/ref/pixfmt/yuv444p12-yuvj420p | 2 +- tests/ref/pixfmt/yuv444p12-yuvj422p | 2 +- tests/ref/pixfmt/yuv444p12-yuvj440p | 2 +- tests/ref/pixfmt/yuv444p12-yuvj444p | 2 +- tests/ref/pixfmt/yuv444p16-gray16be | 2 +- tests/ref/pixfmt/yuv444p16-gray16le | 2 +- tests/ref/pixfmt/yuvj420p | 2 +- tests/ref/pixfmt/yuvj422p | 2 +- tests/ref/pixfmt/yuvj440p | 2 +- tests/ref/pixfmt/yuvj444p | 2 +- tests/ref/seek/lavf-jpg | 8 +- tests/ref/seek/vsynth_lena-mjpeg | 40 ++-- tests/ref/seek/vsynth_lena-roqvideo | 2 +- tests/ref/vsynth/vsynth1-amv | 8 +- tests/ref/vsynth/vsynth1-mjpeg | 6 +- tests/ref/vsynth/vsynth1-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth1-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth1-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth1-roqvideo | 8 +- tests/ref/vsynth/vsynth2-amv | 6 +- tests/ref/vsynth/vsynth2-mjpeg | 6 +- tests/ref/vsynth/vsynth2-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth2-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth2-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth2-roqvideo | 8 +- tests/ref/vsynth/vsynth3-amv | 8 +- tests/ref/vsynth/vsynth3-mjpeg | 8 +- tests/ref/vsynth/vsynth3-mjpeg-422 | 8 +- tests/ref/vsynth/vsynth3-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth3-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth3-mjpeg-trell | 6 +- tests/ref/vsynth/vsynth3-mjpeg-trell-huffman | 6 +- tests/ref/vsynth/vsynth_lena-amv | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-trell | 8 +- .../vsynth/vsynth_lena-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth_lena-roqvideo | 8 +- 188 files changed, 1189 insertions(+), 836 deletions(-) -- 2.39.5 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".