Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

Pádraig Brady Mon, 25 Nov 2024 16:11:09 -0800

On 25/11/2024 23:27, Sam Russell wrote:

The intrinsics guide is a nice find, I dug a bit deeper into the Intel®
Architecture Instruction Set Extensions and Future Features Programming
Reference [1] from March 2018 and it shows the 4 variants:


VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag:  VPCLMULQDQ

EVEX.NDS.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ

EVEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag: AVX512VL, VPCLMULQDQ

EVEX.NDS.512.66.0F3A.WIG 44 /r /ib VPCLMULQDQ zmm1, zmm2, zmm3/m512, imm8
CPUID feature flag: AVX512F, VPCLMULQDQ

So the VPCLMULQDQ opcode needs AVX512VL and VPCLMULQDQ to be encoded with
the EVEX prefix (and use xmm/ymm), or AVX512F and VPCLMULQDQ to use zmm,
but only VPCLMULQDQ to be encoded with the VEX prefix for avx256. The build
flags for the cksum_avx2 object are `-mpclmul -mavx -mavx2 -mvpclmulqdq` so
the lack of any avx512 support should ensure it compiles to VEX and not
EVEX.


Thanks for all the investigation.
However I don't see any changes in CFLAGS or builtin_cpu_supports() checks
between the first and this patch. Am I missing something?

Also I was wondering how parameterizable the new code is.
I.e. would it be easy to parameterize to support -a crc32b?
From my previous notes on the gnulib list I summarized the differences as:

cksum -a crc parameters:
------------------------
Polynomial: 04C11DB7
Initial Value: 00000000
Final XOR Value: 00000000
Reverse data: no
Reverse crc (before xor): no

cksum -a crc32b (gnulib crc32) equivalent parameters:
------------------------
Polynomial: 04C11DB7
Initial Value: FFFFFFFF
Final XOR Value: FFFFFFFF
Reverse data: yes
Reverse crc (before xor): yes

cheers,
Pádraig

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

Reply via email to