On 1/29/2025 10:03 AM, Shreesh Adiga wrote:
Hi Andreas,

I am not sure if that is needed. I can add the data observed on my machine
(AMD 7950x Zen 4),
I think this will vary from machine to machine. It is expected to be around
2x
compared to AVX2 and there is no core change apart from processing the
scalar loop with masked instructions.

The data doesn't entirely look consistent as per my expectations.
All the shuffle variants are equivalent in the work they do, yet the
speedups
are not consistent as per the report.

shuffle_bytes_0321_c:                                   56.5 ( 1.00x)
shuffle_bytes_0321_ssse3:                               15.2 ( 3.70x)
shuffle_bytes_0321_avx2:                                10.2 ( 5.51x)
shuffle_bytes_0321_avx512icl:                            9.2 ( 6.11x)
shuffle_bytes_1230_c:                                   84.5 ( 1.00x)
shuffle_bytes_1230_ssse3:                               14.2 ( 5.93x)
shuffle_bytes_1230_avx2:                                15.2 ( 5.54x)
shuffle_bytes_1230_avx512icl:                           11.2 ( 7.51x)
shuffle_bytes_2103_c:                                   48.5 ( 1.00x)
shuffle_bytes_2103_ssse3:                               21.2 ( 2.28x)
shuffle_bytes_2103_avx2:                                13.8 ( 3.53x)
shuffle_bytes_2103_avx512icl:                            9.2 ( 5.24x)
shuffle_bytes_3012_c:                                   84.5 ( 1.00x)
shuffle_bytes_3012_ssse3:                               14.2 ( 5.93x)
shuffle_bytes_3012_avx2:                                16.2 ( 5.20x)
shuffle_bytes_3012_avx512icl:                           10.2 ( 8.24x)
shuffle_bytes_3210_c:                                   89.2 ( 1.00x)
shuffle_bytes_3210_ssse3:                               24.2 ( 3.68x)
shuffle_bytes_3210_avx2:                                16.2 ( 5.49x)
shuffle_bytes_3210_avx512icl:                            9.2 ( 9.65x)

I can add the details to commit message if you can confirm if it is needed.

Thanks,
Shreesh

Added the benchmarks and pushed the patch. Thanks.

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to