Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422

2025-03-03 Thread Shreesh Adiga
On Thu, Feb 20, 2025 at 6:51 PM Shreesh Adiga <16567adigashre...@gmail.com> wrote: > > Currently the AVX2 version of uyvytoyuv422 in the SIMD loop does the > following: > 4 vinsertq to have interleaving of the vector lanes during load from memory. > 4 vperm2i128 inside 4

[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: optimize AVX2 version of uyvytoyuv422

2025-02-20 Thread Shreesh Adiga
:3874.4 ( 7.51x) uyvytoyuv422_avx: 3371.6 ( 8.63x) uyvytoyuv422_avx2:2174.6 (13.38x) uyvytoyuv422_avx512icl: 1625.1 (17.90x) Signed-off-by: Shreesh Adiga

Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422

2025-02-18 Thread Shreesh Adiga
On Mon, Feb 3, 2025 at 10:03 PM Shreesh Adiga <16567adigashre...@gmail.com> wrote: > > The scalar loop is replaced with masked AVX512 instructions. > For extracting the Y from UYVY, vperm2b is used instead of > various AND and packuswb. > > Instead of loading the vectors w

[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL version of uyvytoyuv422

2025-02-03 Thread Shreesh Adiga
) uyvytoyuv422_avx2:2649.8 (10.98x) uyvytoyuv422_avx512icl: 1615.0 (18.02x) Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com> --- libswscale/x86/rgb2rgb.c | 6 ++ libswscale/x86/rgb_2_rgb.asm

Re: [FFmpeg-devel] [PATCH v3] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-29 Thread Shreesh Adiga
_avx2:16.2 ( 5.49x) shuffle_bytes_3210_avx512icl:9.2 ( 9.65x) I can add the details to commit message if you can confirm if it is needed. Thanks, Shreesh On Wed, Jan 29, 2025 at 5:46 PM Andreas Rheinhardt < andreas.rheinha...@outlook.com> wrote: > Shreesh Adiga:

[FFmpeg-devel] [PATCH v3] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-28 Thread Shreesh Adiga
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com> --- v3: Fix build failure on older nasm by replacing "kmovw k, tmpw" with "kmov k, tmpd" which matches "kmovw k, r32" syntax. v2: Tried to align operands and improve indentation for ASM routine

[FFmpeg-devel] [PATCH v2] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-25 Thread Shreesh Adiga
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com> --- v2: Tried to align operands and improve indentation for ASM routine. libswscale/x86/rgb2rgb.c | 21 + libswscale/x86/rgb_2_rgb.asm | 90 +++- 2 files changed, 80 insertions(+), 31 del

Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-25 Thread Shreesh Adiga
> Try running it several times using the same seed, so > "tests/checkasm/checkasm --test=sw_rgb --bench 17575157", and make sure > no power saving feature is enabled (so the CPU frequency doesn't change > based on load). That may help getting consistent results. After running "echo performance | t

Re: [FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-25 Thread Shreesh Adiga
> Thanks for the patch. Could you please compile and run > tests/checkasm/checkasm with "--test=sw_rgb --bench" and paste the > results for the shuffle_bytes functions, to see if there's a speed up > compared to the AVX2 implementation? I ran the command "tests/checkasm/checkasm --test=sw_rgb --be

[FFmpeg-devel] [PATCH] swscale/x86/rgb2rgb: add AVX512ICL versions of shuffle_bytes

2025-01-25 Thread Shreesh Adiga
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com> --- libswscale/x86/rgb2rgb.c | 21 + libswscale/x86/rgb_2_rgb.asm | 28 2 files changed, 49 insertions(+) diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c