On Tue, Dec 3, 2024 at 3:35 AM Michael Niedermayer <mich...@niedermayer.cc> wrote: > On Sun, Dec 01, 2024 at 07:20:06PM +0100, Ramiro Polla wrote: > > There is an issue with the constants used in YUV to YUV range conversion, > > where the upper bound is not respected when converting to mpeg range. > > > > With this commit, the constants are calculated at runtime, depending on > > the bit depth. This approach also allows us to more easily understand how > > the constants are derived. > > > > For bit depths <= 14, the number of fixed point bits has been set to 14 > > for all conversions, to simplify the code. > > For bit depths > 14, the number of fixed points bits has been raised and > > set to 18, to allow for the conversion to be accurate enough for the mpeg > > range to be respected. > > > > The convert functions now take the conversion constants (coeff and offset) > > as function arguments. > > For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit. > > For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit. > > > > x86_64: > > chrRangeFromJpeg8_1920_c: 2127.4 2125.0 (1.00x) > > chrRangeFromJpeg16_1920_c: 2325.2 2127.2 (1.09x) > > chrRangeToJpeg8_1920_c: 3166.9 3168.7 (1.00x) > > chrRangeToJpeg16_1920_c: 2152.4 3164.8 (0.68x) > > lumRangeFromJpeg8_1920_c: 1263.0 1302.5 (0.97x) > > lumRangeFromJpeg16_1920_c: 1080.5 1299.2 (0.83x) > > lumRangeToJpeg8_1920_c: 1886.8 2112.2 (0.89x) > > lumRangeToJpeg16_1920_c: 1077.0 1906.5 (0.56x) > > > > aarch64 A55: > > chrRangeFromJpeg8_1920_c: 28835.2 28835.6 (1.00x) > > chrRangeFromJpeg16_1920_c: 28839.8 32680.8 (0.88x) > > chrRangeToJpeg8_1920_c: 23074.7 23075.4 (1.00x) > > chrRangeToJpeg16_1920_c: 17318.9 24996.0 (0.69x) > > lumRangeFromJpeg8_1920_c: 15389.7 15384.5 (1.00x) > > lumRangeFromJpeg16_1920_c: 15388.2 17306.7 (0.89x) > > lumRangeToJpeg8_1920_c: 19227.8 19226.6 (1.00x) > > lumRangeToJpeg16_1920_c: 15387.0 21146.3 (0.73x) > > > > aarch64 A76: > > chrRangeFromJpeg8_1920_c: 6324.4 6268.1 (1.01x) > > chrRangeFromJpeg16_1920_c: 6339.9 11521.5 (0.55x) > > chrRangeToJpeg8_1920_c: 9656.0 9612.8 (1.00x) > > chrRangeToJpeg16_1920_c: 6340.4 11651.8 (0.54x) > > lumRangeFromJpeg8_1920_c: 4422.0 4420.8 (1.00x) > > lumRangeFromJpeg16_1920_c: 4420.9 5762.0 (0.77x) > > lumRangeToJpeg8_1920_c: 5949.1 5977.5 (1.00x) > > lumRangeToJpeg16_1920_c: 4446.8 5946.2 (0.75x) > > > > NOTE: all simd optimizations for range_convert have been disabled. > > they will be re-enabled when they are fixed for each architecture. > > > > NOTE2: the same issue still exists in rgb2yuv conversions, which is not > > addressed in this commit. > > --- > > libswscale/aarch64/swscale.c | 5 + > > libswscale/hscale.c | 6 +- > > libswscale/swscale.c | 113 +++++++++-- > > libswscale/swscale_internal.h | 26 ++- > > libswscale/x86/swscale.c | 5 + > > tests/checkasm/sw_range_convert.c | 68 ++++++- > > .../fate/filter-alphaextract_alphamerge_rgb | 100 +++++----- > > tests/ref/fate/filter-pixdesc-gray10be | 2 +- > > tests/ref/fate/filter-pixdesc-gray10le | 2 +- > > tests/ref/fate/filter-pixdesc-gray12be | 2 +- > > tests/ref/fate/filter-pixdesc-gray12le | 2 +- > > tests/ref/fate/filter-pixdesc-gray14be | 2 +- > > tests/ref/fate/filter-pixdesc-gray14le | 2 +- > > tests/ref/fate/filter-pixdesc-gray16be | 2 +- > > tests/ref/fate/filter-pixdesc-gray16le | 2 +- > > tests/ref/fate/filter-pixdesc-gray9be | 2 +- > > tests/ref/fate/filter-pixdesc-gray9le | 2 +- > > tests/ref/fate/filter-pixdesc-ya16be | 2 +- > > tests/ref/fate/filter-pixdesc-ya16le | 2 +- > > tests/ref/fate/filter-pixdesc-yuvj411p | 2 +- > > tests/ref/fate/filter-pixdesc-yuvj420p | 2 +- > > tests/ref/fate/filter-pixdesc-yuvj422p | 2 +- > > tests/ref/fate/filter-pixdesc-yuvj440p | 2 +- > > tests/ref/fate/filter-pixdesc-yuvj444p | 2 +- > > tests/ref/fate/filter-pixfmts-copy | 34 ++-- > > tests/ref/fate/filter-pixfmts-crop | 34 ++-- > > tests/ref/fate/filter-pixfmts-field | 34 ++-- > > tests/ref/fate/filter-pixfmts-fieldorder | 30 +-- > > tests/ref/fate/filter-pixfmts-hflip | 34 ++-- > > tests/ref/fate/filter-pixfmts-il | 34 ++-- > > tests/ref/fate/filter-pixfmts-lut | 18 +- > > tests/ref/fate/filter-pixfmts-null | 34 ++-- > > tests/ref/fate/filter-pixfmts-pad | 22 +-- > > tests/ref/fate/filter-pixfmts-pullup | 10 +- > > tests/ref/fate/filter-pixfmts-rotate | 4 +- > > tests/ref/fate/filter-pixfmts-scale | 34 ++-- > > tests/ref/fate/filter-pixfmts-swapuv | 10 +- > > .../ref/fate/filter-pixfmts-tinterlace_cvlpf | 8 +- > > .../ref/fate/filter-pixfmts-tinterlace_merge | 8 +- > > tests/ref/fate/filter-pixfmts-tinterlace_pad | 8 +- > > tests/ref/fate/filter-pixfmts-tinterlace_vlpf | 8 +- > > tests/ref/fate/filter-pixfmts-transpose | 28 +-- > > tests/ref/fate/filter-pixfmts-vflip | 34 ++-- > > tests/ref/fate/fitsenc-gray | 2 +- > > tests/ref/fate/fitsenc-gray16be | 10 +- > > tests/ref/fate/gifenc-gray | 186 +++++++++--------- > > tests/ref/fate/idroq-video-encode | 2 +- > > tests/ref/fate/jpg-icc | 8 +- > > tests/ref/fate/sws-yuv-colorspace | 2 +- > > tests/ref/fate/sws-yuv-range | 2 +- > > tests/ref/fate/vvc-conformance-SCALING_A_1 | 128 ++++++------ > > tests/ref/lavf/gray16be.fits | 4 +- > > tests/ref/lavf/gray16be.pam | 4 +- > > tests/ref/lavf/gray16be.png | 6 +- > > tests/ref/lavf/jpg | 6 +- > > tests/ref/lavf/smjpeg | 6 +- > > tests/ref/pixfmt/gbrp-gray | 2 +- > > tests/ref/pixfmt/gbrp-gray10be | 2 +- > > tests/ref/pixfmt/gbrp-gray10le | 2 +- > > tests/ref/pixfmt/gbrp-gray12be | 2 +- > > tests/ref/pixfmt/gbrp-gray12le | 2 +- > > tests/ref/pixfmt/gbrp-gray16be | 2 +- > > tests/ref/pixfmt/gbrp-gray16le | 2 +- > > tests/ref/pixfmt/gbrp-yuvj420p | 2 +- > > tests/ref/pixfmt/gbrp-yuvj422p | 2 +- > > tests/ref/pixfmt/gbrp-yuvj440p | 2 +- > > tests/ref/pixfmt/gbrp-yuvj444p | 2 +- > > tests/ref/pixfmt/gbrp10-gray | 2 +- > > tests/ref/pixfmt/gbrp10-gray10be | 2 +- > > tests/ref/pixfmt/gbrp10-gray10le | 2 +- > > tests/ref/pixfmt/gbrp10-gray12be | 2 +- > > tests/ref/pixfmt/gbrp10-gray12le | 2 +- > > tests/ref/pixfmt/gbrp10-gray16be | 2 +- > > tests/ref/pixfmt/gbrp10-gray16le | 2 +- > > tests/ref/pixfmt/gbrp10-yuvj420p | 2 +- > > tests/ref/pixfmt/gbrp10-yuvj422p | 2 +- > > tests/ref/pixfmt/gbrp10-yuvj440p | 2 +- > > tests/ref/pixfmt/gbrp10-yuvj444p | 2 +- > > tests/ref/pixfmt/gbrp12-gray | 2 +- > > tests/ref/pixfmt/gbrp12-gray10be | 2 +- > > tests/ref/pixfmt/gbrp12-gray10le | 2 +- > > tests/ref/pixfmt/gbrp12-gray12be | 2 +- > > tests/ref/pixfmt/gbrp12-gray12le | 2 +- > > tests/ref/pixfmt/gbrp12-gray16be | 2 +- > > tests/ref/pixfmt/gbrp12-gray16le | 2 +- > > tests/ref/pixfmt/gbrp12-yuvj420p | 2 +- > > tests/ref/pixfmt/gbrp12-yuvj422p | 2 +- > > tests/ref/pixfmt/gbrp12-yuvj440p | 2 +- > > tests/ref/pixfmt/gbrp12-yuvj444p | 2 +- > > tests/ref/pixfmt/gbrp16-gray16be | 2 +- > > tests/ref/pixfmt/gbrp16-gray16le | 2 +- > > tests/ref/pixfmt/rgb24-gray | 2 +- > > tests/ref/pixfmt/rgb24-gray10be | 2 +- > > tests/ref/pixfmt/rgb24-gray10le | 2 +- > > tests/ref/pixfmt/rgb24-gray12be | 2 +- > > tests/ref/pixfmt/rgb24-gray12le | 2 +- > > tests/ref/pixfmt/rgb24-gray16be | 2 +- > > tests/ref/pixfmt/rgb24-gray16le | 2 +- > > tests/ref/pixfmt/rgb24-yuvj420p | 2 +- > > tests/ref/pixfmt/rgb24-yuvj422p | 2 +- > > tests/ref/pixfmt/rgb24-yuvj440p | 2 +- > > tests/ref/pixfmt/rgb24-yuvj444p | 2 +- > > tests/ref/pixfmt/rgb48-gray | 2 +- > > tests/ref/pixfmt/rgb48-gray10be | 2 +- > > tests/ref/pixfmt/rgb48-gray10le | 2 +- > > tests/ref/pixfmt/rgb48-gray12be | 2 +- > > tests/ref/pixfmt/rgb48-gray12le | 2 +- > > tests/ref/pixfmt/rgb48-gray16be | 2 +- > > tests/ref/pixfmt/rgb48-gray16le | 2 +- > > tests/ref/pixfmt/rgb48-yuvj420p | 2 +- > > tests/ref/pixfmt/rgb48-yuvj422p | 2 +- > > tests/ref/pixfmt/rgb48-yuvj440p | 2 +- > > tests/ref/pixfmt/rgb48-yuvj444p | 2 +- > > tests/ref/pixfmt/yuv444p-gray10be | 2 +- > > tests/ref/pixfmt/yuv444p-gray10le | 2 +- > > tests/ref/pixfmt/yuv444p-gray12be | 2 +- > > tests/ref/pixfmt/yuv444p-gray12le | 2 +- > > tests/ref/pixfmt/yuv444p-gray16be | 2 +- > > tests/ref/pixfmt/yuv444p-gray16le | 2 +- > > tests/ref/pixfmt/yuv444p-yuvj420p | 2 +- > > tests/ref/pixfmt/yuv444p-yuvj422p | 2 +- > > tests/ref/pixfmt/yuv444p-yuvj440p | 2 +- > > tests/ref/pixfmt/yuv444p10-gray | 2 +- > > tests/ref/pixfmt/yuv444p10-gray10be | 2 +- > > tests/ref/pixfmt/yuv444p10-gray10le | 2 +- > > tests/ref/pixfmt/yuv444p10-gray12be | 2 +- > > tests/ref/pixfmt/yuv444p10-gray12le | 2 +- > > tests/ref/pixfmt/yuv444p10-gray16be | 2 +- > > tests/ref/pixfmt/yuv444p10-gray16le | 2 +- > > tests/ref/pixfmt/yuv444p10-yuvj420p | 2 +- > > tests/ref/pixfmt/yuv444p10-yuvj422p | 2 +- > > tests/ref/pixfmt/yuv444p10-yuvj440p | 2 +- > > tests/ref/pixfmt/yuv444p10-yuvj444p | 2 +- > > tests/ref/pixfmt/yuv444p12-gray | 2 +- > > tests/ref/pixfmt/yuv444p12-gray10be | 2 +- > > tests/ref/pixfmt/yuv444p12-gray10le | 2 +- > > tests/ref/pixfmt/yuv444p12-gray12be | 2 +- > > tests/ref/pixfmt/yuv444p12-gray12le | 2 +- > > tests/ref/pixfmt/yuv444p12-gray16be | 2 +- > > tests/ref/pixfmt/yuv444p12-gray16le | 2 +- > > tests/ref/pixfmt/yuv444p12-yuvj420p | 2 +- > > tests/ref/pixfmt/yuv444p12-yuvj422p | 2 +- > > tests/ref/pixfmt/yuv444p12-yuvj440p | 2 +- > > tests/ref/pixfmt/yuv444p12-yuvj444p | 2 +- > > tests/ref/pixfmt/yuv444p16-gray16be | 2 +- > > tests/ref/pixfmt/yuv444p16-gray16le | 2 +- > > tests/ref/pixfmt/yuvj420p | 2 +- > > tests/ref/pixfmt/yuvj422p | 2 +- > > tests/ref/pixfmt/yuvj440p | 2 +- > > tests/ref/pixfmt/yuvj444p | 2 +- > > tests/ref/seek/lavf-jpg | 8 +- > > tests/ref/seek/vsynth_lena-mjpeg | 40 ++-- > > tests/ref/seek/vsynth_lena-roqvideo | 2 +- > > tests/ref/vsynth/vsynth1-amv | 8 +- > > tests/ref/vsynth/vsynth1-mjpeg | 6 +- > > tests/ref/vsynth/vsynth1-mjpeg-422 | 6 +- > > tests/ref/vsynth/vsynth1-mjpeg-444 | 6 +- > > tests/ref/vsynth/vsynth1-mjpeg-huffman | 6 +- > > tests/ref/vsynth/vsynth1-mjpeg-trell | 8 +- > > tests/ref/vsynth/vsynth1-mjpeg-trell-huffman | 8 +- > > tests/ref/vsynth/vsynth1-roqvideo | 8 +- > > tests/ref/vsynth/vsynth2-amv | 6 +- > > tests/ref/vsynth/vsynth2-mjpeg | 6 +- > > tests/ref/vsynth/vsynth2-mjpeg-422 | 6 +- > > tests/ref/vsynth/vsynth2-mjpeg-444 | 6 +- > > tests/ref/vsynth/vsynth2-mjpeg-huffman | 6 +- > > tests/ref/vsynth/vsynth2-mjpeg-trell | 8 +- > > tests/ref/vsynth/vsynth2-mjpeg-trell-huffman | 8 +- > > tests/ref/vsynth/vsynth2-roqvideo | 8 +- > > tests/ref/vsynth/vsynth3-amv | 8 +- > > tests/ref/vsynth/vsynth3-mjpeg | 8 +- > > tests/ref/vsynth/vsynth3-mjpeg-422 | 8 +- > > tests/ref/vsynth/vsynth3-mjpeg-444 | 6 +- > > tests/ref/vsynth/vsynth3-mjpeg-huffman | 8 +- > > tests/ref/vsynth/vsynth3-mjpeg-trell | 6 +- > > tests/ref/vsynth/vsynth3-mjpeg-trell-huffman | 6 +- > > tests/ref/vsynth/vsynth_lena-amv | 6 +- > > tests/ref/vsynth/vsynth_lena-mjpeg | 8 +- > > tests/ref/vsynth/vsynth_lena-mjpeg-422 | 6 +- > > tests/ref/vsynth/vsynth_lena-mjpeg-444 | 6 +- > > tests/ref/vsynth/vsynth_lena-mjpeg-huffman | 8 +- > > tests/ref/vsynth/vsynth_lena-mjpeg-trell | 8 +- > > .../vsynth/vsynth_lena-mjpeg-trell-huffman | 8 +- > > tests/ref/vsynth/vsynth_lena-roqvideo | 8 +- > > 184 files changed, 880 insertions(+), 725 deletions(-) > > should be ok if tested and output values are ok
Thanks. I'll commit the patchset in a couple of days if there are no more comments. Martin had already ok'd the aarch64 patches. I'd appreciate it if someone could have a look at the x86 patches. Ramiro _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".