_avx2:16.2 ( 5.49x)
shuffle_bytes_3210_avx512icl:9.2 ( 9.65x)
I can add the details to commit message if you can confirm if it is needed.
Thanks,
Shreesh
On Wed, Jan 29, 2025 at 5:46 PM Andreas Rheinhardt <
andreas.rheinha...@outlook.com> wrote:
> Shreesh Adiga:
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com>
---
v3: Fix build failure on older nasm by replacing "kmovw k, tmpw"
with "kmov k, tmpd" which matches "kmovw k, r32" syntax.
v2: Tried to align operands and improve indentation for ASM routine
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com>
---
v2: Tried to align operands and improve indentation for ASM routine.
libswscale/x86/rgb2rgb.c | 21 +
libswscale/x86/rgb_2_rgb.asm | 90 +++-
2 files changed, 80 insertions(+), 31 del
> Try running it several times using the same seed, so
> "tests/checkasm/checkasm --test=sw_rgb --bench 17575157", and make sure
> no power saving feature is enabled (so the CPU frequency doesn't change
> based on load). That may help getting consistent results.
After running "echo performance | t
> Thanks for the patch. Could you please compile and run
> tests/checkasm/checkasm with "--test=sw_rgb --bench" and paste the
> results for the shuffle_bytes functions, to see if there's a speed up
> compared to the AVX2 implementation?
I ran the command "tests/checkasm/checkasm --test=sw_rgb --be
Signed-off-by: Shreesh Adiga <16567adigashre...@gmail.com>
---
libswscale/x86/rgb2rgb.c | 21 +
libswscale/x86/rgb_2_rgb.asm | 28
2 files changed, 49 insertions(+)
diff --git a/libswscale/x86/rgb2rgb.c b/libswscale/x86/rgb2rgb.c