Results thanks to Jeff srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 10000 Hash: 5B9DA0F4, length: 1048575
real 0m3.561s user 0m3.535s sys 0m0.026s srussell@icelake:~$ time ./cksum_bench_avx2 1048575 10000 Hash: 5B9DA0F4, length: 1048575 real 0m2.083s user 0m2.047s sys 0m0.036s srussell@icelake:~$ time ./cksum_bench_avx512 1048575 10000 Hash: 5B9DA0F4, length: 1048575 real 0m1.353s user 0m1.320s sys 0m0.033s Zero code change in the algorithm so we're effectively testing whether I've calculated the constants correctly and whether I'm loading the previous CRC into the correct part of the AVX register. Attached patch has Pádraig's feedback plus the new runtime check that will enable the AVX2 version if avx512f is specified but the avx512_supported() check has failed (because vpclmulqdq isn't set). I would appreciate if anyone has a definitive answer on the correct way to test for avx2+vpclmulqdq vs avx512+vpclmulqdq, and whether any chip exists that supports a subset avx512 but also doesn't support vpclmulqdq on avx2... On Mon, 25 Nov 2024 at 19:29, Sam Russell <sam.h.russ...@gmail.com> wrote: > Thanks, sent key off-list > > I also think I've been confusing myself, the benchmark program doesn't > check the flags. I think I will need to change the logic though, here's the > lscpu from my Radeon with AVX2 > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 48 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 12 > On-line CPU(s) list: 0-11 > Vendor ID: AuthenticAMD > Model name: AMD Ryzen 5 5600 6-Core Processor > CPU family: 25 > Model: 33 > Thread(s) per core: 2 > Core(s) per socket: 6 > Socket(s): 1 > Stepping: 2 > BogoMIPS: 6986.86 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr > pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext > fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_ > good nopl tsc_reliable nonstop_tsc cpuid > extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes > xsave avx f16c rdrand hypervisor lahf_lm cmp_legac > y cr8_legacy abm sse4a misalignsse 3dnowprefetch > osvw topoext ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms > rdseed adx smap clflushopt clwb sha_ni xsav > eopt xsavec xgetbv1 xsaves clzero xsaveerptr arat > umip vaes vpclmulqdq rdpid fsrm > > So it does set vpclmulqdq but doesn't set avx512. Jeff's CPU has both > avx512f and vpclmulqdq, and the skylake on EC2 has avx512f but does NOT > have vpclmulqdq. This might mean that we'll want AVX2 on any AVX2 processor > with vpclmulqdq, and any AVX512 processor that does NOT have vpclmulqdq > set, does that seem logical? >
0001-cksum-Use-AVX2-and-AVX512-for-speedup.patch
Description: Binary data