On Mon, Nov 25, 2024 at 5:31 PM Sam Russell <sam.h.russ...@gmail.com> wrote: > > Results thanks to Jeff > > srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 10000 > Hash: 5B9DA0F4, length: 1048575 > > real 0m3.561s > user 0m3.535s > sys 0m0.026s > srussell@icelake:~$ time ./cksum_bench_avx2 1048575 10000 > Hash: 5B9DA0F4, length: 1048575 > > real 0m2.083s > user 0m2.047s > sys 0m0.036s > srussell@icelake:~$ time ./cksum_bench_avx512 1048575 10000 > Hash: 5B9DA0F4, length: 1048575 > > real 0m1.353s > user 0m1.320s > sys 0m0.033s > > Zero code change in the algorithm so we're effectively testing whether I've > calculated the constants correctly and whether I'm loading the previous CRC > into the correct part of the AVX register. > > Attached patch has Pádraig's feedback plus the new runtime check that will > enable the AVX2 version if avx512f is specified but the avx512_supported() > check has failed (because vpclmulqdq isn't set). I would appreciate if anyone > has a definitive answer on the correct way to test for avx2+vpclmulqdq vs > avx512+vpclmulqdq, and whether any chip exists that supports a subset avx512 > but also doesn't support vpclmulqdq on avx2...
I don't believe you will encounter avx2+vpclmulqdq. According to the Intel Intrinsic Guide,[1] vpclmulqdq is AVX512. If you have AVX512, then AVX2 is a proper subset available to you. (You won't find AVX2 plus a few AVX512 features. That combination will not show up on AVX2 machines, like Skylake or Kaby Lake). According to the Intel Intrinsic Guide,[1] you should check for VPCLMULQDQ+AVX512VL _if_ you are using vpclmulqdq ymm, ymm, ymm, imm8 form. You should check for VPCLMULQDQ alone _if_ you are using the vpclmulqdq zmm, zmm, zmm, imm8 form. [1] <https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=vpclmulqdq>. Jeff