On 1/29/2025 10:03 AM, Shreesh Adiga wrote:
Hi Andreas,I am not sure if that is needed. I can add the data observed on my machine (AMD 7950x Zen 4), I think this will vary from machine to machine. It is expected to be around 2x compared to AVX2 and there is no core change apart from processing the scalar loop with masked instructions. The data doesn't entirely look consistent as per my expectations. All the shuffle variants are equivalent in the work they do, yet the speedups are not consistent as per the report. shuffle_bytes_0321_c: 56.5 ( 1.00x) shuffle_bytes_0321_ssse3: 15.2 ( 3.70x) shuffle_bytes_0321_avx2: 10.2 ( 5.51x) shuffle_bytes_0321_avx512icl: 9.2 ( 6.11x) shuffle_bytes_1230_c: 84.5 ( 1.00x) shuffle_bytes_1230_ssse3: 14.2 ( 5.93x) shuffle_bytes_1230_avx2: 15.2 ( 5.54x) shuffle_bytes_1230_avx512icl: 11.2 ( 7.51x) shuffle_bytes_2103_c: 48.5 ( 1.00x) shuffle_bytes_2103_ssse3: 21.2 ( 2.28x) shuffle_bytes_2103_avx2: 13.8 ( 3.53x) shuffle_bytes_2103_avx512icl: 9.2 ( 5.24x) shuffle_bytes_3012_c: 84.5 ( 1.00x) shuffle_bytes_3012_ssse3: 14.2 ( 5.93x) shuffle_bytes_3012_avx2: 16.2 ( 5.20x) shuffle_bytes_3012_avx512icl: 10.2 ( 8.24x) shuffle_bytes_3210_c: 89.2 ( 1.00x) shuffle_bytes_3210_ssse3: 24.2 ( 3.68x) shuffle_bytes_3210_avx2: 16.2 ( 5.49x) shuffle_bytes_3210_avx512icl: 9.2 ( 9.65x) I can add the details to commit message if you can confirm if it is needed. Thanks, Shreesh
Added the benchmarks and pushed the patch. Thanks.
OpenPGP_signature.asc
Description: OpenPGP digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".