There is an issue with the constants used in YUV to YUV range conversion, where the upper bound is not respected when converting to mpeg range.
With this patchset, the constants are calculated at runtime, depending on the bit depth. This approach also allows us to more easily understand how the constants are derived. These are the speedups for the entire patchset: x86_64: chrRangeFromJpeg8_1920_c: 5827.4 5845.2 ( 1.00x) chrRangeFromJpeg8_1920_sse2: 1945.6 1955.2 ( 1.00x) chrRangeFromJpeg8_1920_avx2: 992.0 988.9 ( 1.00x) chrRangeFromJpeg16_1920_c: 5793.2 5809.1 ( 1.00x) chrRangeToJpeg8_1920_c: 11726.2 9462.2 ( 1.24x) chrRangeToJpeg8_1920_sse2: 1965.5 1949.9 ( 1.01x) chrRangeToJpeg8_1920_avx2: 984.2 988.5 ( 1.00x) chrRangeToJpeg16_1920_c: 10610.8 9261.5 ( 1.15x) lumRangeFromJpeg8_1920_c: 4165.7 4191.4 ( 0.99x) lumRangeFromJpeg8_1920_sse2: 1032.0 1040.5 ( 0.99x) lumRangeFromJpeg8_1920_avx2: 575.2 520.5 ( 1.11x) lumRangeFromJpeg16_1920_c: 4530.0 4143.4 ( 1.09x) lumRangeToJpeg8_1920_c: 6044.8 5720.5 ( 1.06x) lumRangeToJpeg8_1920_sse2: 1034.2 1046.0 ( 0.99x) lumRangeToJpeg8_1920_avx2: 513.5 540.5 ( 0.95x) lumRangeToJpeg16_1920_c: 5343.6 5139.5 ( 1.04x) aarch64 A55: chrRangeFromJpeg8_1920_c: 28839.3 28834.8 ( 1.00x) chrRangeFromJpeg8_1920_neon: 5312.2 5313.1 ( 1.00x) chrRangeFromJpeg16_1920_c: 28843.8 28840.6 ( 1.00x) chrRangeToJpeg8_1920_c: 44196.1 23072.5 ( 1.92x) chrRangeToJpeg8_1920_neon: 6035.9 5550.8 ( 1.09x) chrRangeToJpeg16_1920_c: 36526.7 23075.1 ( 1.58x) lumRangeFromJpeg8_1920_c: 15384.3 15386.7 ( 1.00x) lumRangeFromJpeg8_1920_neon: 3148.6 3145.8 ( 1.00x) lumRangeFromJpeg16_1920_c: 15390.1 15383.8 ( 1.00x) lumRangeToJpeg8_1920_c: 23066.7 19223.6 ( 1.20x) lumRangeToJpeg8_1920_neon: 3868.8 3624.9 ( 1.07x) lumRangeToJpeg16_1920_c: 19224.6 19225.5 ( 1.00x) aarch64 A76: chrRangeFromJpeg8_1920_c: 6316.2 6318.5 ( 1.00x) chrRangeFromJpeg8_1920_neon: 2263.5 2304.2 ( 0.98x) chrRangeFromJpeg16_1920_c: 6321.9 6323.5 ( 1.00x) chrRangeToJpeg8_1920_c: 11389.3 9170.0 ( 1.24x) chrRangeToJpeg8_1920_neon: 2644.2 2793.8 ( 0.95x) chrRangeToJpeg16_1920_c: 9514.4 9195.6 ( 1.03x) lumRangeFromJpeg8_1920_c: 4376.0 4425.5 ( 0.99x) lumRangeFromJpeg8_1920_neon: 1110.8 1105.0 ( 1.01x) lumRangeFromJpeg16_1920_c: 4437.9 4436.8 ( 1.00x) lumRangeToJpeg8_1920_c: 6667.0 6017.2 ( 1.11x) lumRangeToJpeg8_1920_neon: 1327.5 1328.0 ( 1.00x) lumRangeToJpeg16_1920_c: 6062.5 6017.2 ( 1.01x) NOTE: simd optimizations for x86 and aarch64 have been updated, but riscv and loongarch are still missing (and therefore disabled). NOTE2: the same issue still exists in rgb2yuv conversions, which is not addressed in this patchset. Changes from v1: - Saturate the output value instead of limiting the input with amax; - Add more comprehensive benchmarks to commit messages; - Add comments when disabling code with "#if 0"; Ramiro Polla (16): swscale/range_convert: call arch-specific init functions from main init function swscale/range_convert: drop redundant conditionals from arch-specific init functions swscale/range_convert: indent after previous commit checkasm: use FF_ARRAY_ELEMS instead of hardcoding size of arrays checkasm/sw_range_convert: use YUV pixel formats instead of YUVJ checkasm/sw_range_convert: reduce number of input sizes tested checkasm/sw_range_convert: only run benchmarks on largest input width checkasm/sw_range_convert: test all supported bit depths checkasm/sw_range_convert: indent after previous couple of commits swscale/range_convert: saturate output instead of limiting input swscale/aarch64/range_convert: saturate output instead of limiting input swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats swscale/x86/range_convert: update sse2 and avx2 range_convert functions to new API swscale/x86: add sse2, sse4, and avx2 {lum,chr}ConvertRange16 swscale/aarch64/range_convert: update neon range_convert functions to new API swscale/aarch64: add neon {lum,chr}ConvertRange16 libswscale/aarch64/range_convert_neon.S | 152 ++++++++++---- libswscale/aarch64/swscale.c | 41 +++- libswscale/hscale.c | 6 +- libswscale/loongarch/swscale_init_loongarch.c | 38 ++-- libswscale/riscv/swscale.c | 15 +- libswscale/swscale.c | 122 ++++++++++-- libswscale/swscale_internal.h | 11 +- libswscale/utils.c | 10 +- libswscale/x86/range_convert.asm | 161 ++++++++++----- libswscale/x86/swscale.c | 56 ++++-- tests/checkasm/sw_gbrp.c | 15 +- tests/checkasm/sw_range_convert.c | 186 +++++++++++++----- tests/checkasm/sw_scale.c | 11 +- .../fate/filter-alphaextract_alphamerge_rgb | 100 +++++----- tests/ref/fate/filter-pixdesc-gray10be | 2 +- tests/ref/fate/filter-pixdesc-gray10le | 2 +- tests/ref/fate/filter-pixdesc-gray12be | 2 +- tests/ref/fate/filter-pixdesc-gray12le | 2 +- tests/ref/fate/filter-pixdesc-gray14be | 2 +- tests/ref/fate/filter-pixdesc-gray14le | 2 +- tests/ref/fate/filter-pixdesc-gray16be | 2 +- tests/ref/fate/filter-pixdesc-gray16le | 2 +- tests/ref/fate/filter-pixdesc-gray9be | 2 +- tests/ref/fate/filter-pixdesc-gray9le | 2 +- tests/ref/fate/filter-pixdesc-ya16be | 2 +- tests/ref/fate/filter-pixdesc-ya16le | 2 +- tests/ref/fate/filter-pixdesc-yuvj411p | 2 +- tests/ref/fate/filter-pixdesc-yuvj420p | 2 +- tests/ref/fate/filter-pixdesc-yuvj422p | 2 +- tests/ref/fate/filter-pixdesc-yuvj440p | 2 +- tests/ref/fate/filter-pixdesc-yuvj444p | 2 +- tests/ref/fate/filter-pixfmts-copy | 34 ++-- tests/ref/fate/filter-pixfmts-crop | 34 ++-- tests/ref/fate/filter-pixfmts-field | 34 ++-- tests/ref/fate/filter-pixfmts-fieldorder | 30 +-- tests/ref/fate/filter-pixfmts-hflip | 34 ++-- tests/ref/fate/filter-pixfmts-il | 34 ++-- tests/ref/fate/filter-pixfmts-lut | 18 +- tests/ref/fate/filter-pixfmts-null | 34 ++-- tests/ref/fate/filter-pixfmts-pad | 22 +-- tests/ref/fate/filter-pixfmts-pullup | 10 +- tests/ref/fate/filter-pixfmts-rotate | 4 +- tests/ref/fate/filter-pixfmts-scale | 34 ++-- tests/ref/fate/filter-pixfmts-swapuv | 10 +- .../ref/fate/filter-pixfmts-tinterlace_cvlpf | 8 +- .../ref/fate/filter-pixfmts-tinterlace_merge | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_pad | 8 +- tests/ref/fate/filter-pixfmts-tinterlace_vlpf | 8 +- tests/ref/fate/filter-pixfmts-transpose | 28 +-- tests/ref/fate/filter-pixfmts-vflip | 34 ++-- tests/ref/fate/fitsenc-gray | 2 +- tests/ref/fate/fitsenc-gray16be | 10 +- tests/ref/fate/gifenc-gray | 186 +++++++++--------- tests/ref/fate/idroq-video-encode | 2 +- tests/ref/fate/jpg-icc | 8 +- tests/ref/fate/sws-yuv-colorspace | 2 +- tests/ref/fate/sws-yuv-range | 2 +- tests/ref/fate/vvc-conformance-SCALING_A_1 | 128 ++++++------ tests/ref/lavf/gray16be.fits | 4 +- tests/ref/lavf/gray16be.pam | 4 +- tests/ref/lavf/gray16be.png | 6 +- tests/ref/lavf/jpg | 6 +- tests/ref/lavf/smjpeg | 6 +- tests/ref/pixfmt/yuvj420p | 2 +- tests/ref/pixfmt/yuvj422p | 2 +- tests/ref/pixfmt/yuvj440p | 2 +- tests/ref/pixfmt/yuvj444p | 2 +- tests/ref/seek/lavf-jpg | 8 +- tests/ref/seek/vsynth_lena-mjpeg | 40 ++-- tests/ref/seek/vsynth_lena-roqvideo | 2 +- tests/ref/vsynth/vsynth1-amv | 8 +- tests/ref/vsynth/vsynth1-mjpeg | 6 +- tests/ref/vsynth/vsynth1-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth1-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth1-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth1-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth1-roqvideo | 8 +- tests/ref/vsynth/vsynth2-amv | 6 +- tests/ref/vsynth/vsynth2-mjpeg | 6 +- tests/ref/vsynth/vsynth2-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth2-mjpeg-huffman | 6 +- tests/ref/vsynth/vsynth2-mjpeg-trell | 8 +- tests/ref/vsynth/vsynth2-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth2-roqvideo | 8 +- tests/ref/vsynth/vsynth3-amv | 8 +- tests/ref/vsynth/vsynth3-mjpeg | 8 +- tests/ref/vsynth/vsynth3-mjpeg-422 | 8 +- tests/ref/vsynth/vsynth3-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth3-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth3-mjpeg-trell | 6 +- tests/ref/vsynth/vsynth3-mjpeg-trell-huffman | 6 +- tests/ref/vsynth/vsynth_lena-amv | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-422 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-444 | 6 +- tests/ref/vsynth/vsynth_lena-mjpeg-huffman | 8 +- tests/ref/vsynth/vsynth_lena-mjpeg-trell | 8 +- .../vsynth/vsynth_lena-mjpeg-trell-huffman | 8 +- tests/ref/vsynth/vsynth_lena-roqvideo | 8 +- 101 files changed, 1193 insertions(+), 833 deletions(-) -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".