Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-27 Thread Michael Stone
FWIW (since all the avx512 discussion seemed to involve intel CPUs), the new code seems to work fine on AMD zen4 & zen5 in avx512 mode.

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-26 Thread Sam Russell
Nice find protecting the flag checks, thanks for your help. Thanks also to Jeff for giving me access to test on a few different machines with various levels of avx2/avx512 support On Tue, Nov 26, 2024, 17:19 Pádraig Brady wrote: > On 26/11/2024 12:19, Sam Russell wrote: > > > Now I think what

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-26 Thread Pádraig Brady
On 26/11/2024 12:19, Sam Russell wrote: > Now I think what you are saying is there was no SIGILL with the adjusted cksum, and that issue was only with the less protected benchmarking code. Correct, the benchmarking code has zero protections, and the servers I got SIGILL they were not setting

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-26 Thread Sam Russell
> Now I think what you are saying is there was no SIGILL with the adjusted cksum, and that issue was only with the less protected benchmarking code. Correct, the benchmarking code has zero protections, and the servers I got SIGILL they were not setting the VPCLMULQDQ flag so cksum will catch this

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-26 Thread Sam Russell
I'm comfortable with enabling AVX512 by default. If we can find a CPU that sets the VPCLMULQDQ flag but doesn't implement the VPCLMULQDQ opcode then that's probably going to be an issue that affects much more coreutils. On Tue, 26 Nov 2024 at 12:59, Pádraig Brady wrote: > On 26/11/2024 07:35, Sa

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-26 Thread Pádraig Brady
On 26/11/2024 07:35, Sam Russell wrote: > However I don't see any changes in CFLAGS or builtin_cpu_supports() checks > between the first and this patch. Am I missing something? CFLAGS stayed the same because the compiler output is fine (my PC here doesn't have AVX512 but it has a recent gcc t

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
> However I don't see any changes in CFLAGS or builtin_cpu_supports() checks > between the first and this patch. Am I missing something? CFLAGS stayed the same because the compiler output is fine (my PC here doesn't have AVX512 but it has a recent gcc that can build AVX512 instructions). It's poss

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Pádraig Brady
On 25/11/2024 23:27, Sam Russell wrote: The intrinsics guide is a nice find, I dug a bit deeper into the Intel® Architecture Instruction Set Extensions and Future Features Programming Reference [1] from March 2018 and it shows the 4 variants: VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ym

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
The intrinsics guide is a nice find, I dug a bit deeper into the Intel® Architecture Instruction Set Extensions and Future Features Programming Reference [1] from March 2018 and it shows the 4 variants: VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8 CPUID feature flag: V

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Jeffrey Walton
On Mon, Nov 25, 2024 at 5:31 PM Sam Russell wrote: > > Results thanks to Jeff > > srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 1 > Hash: 5B9DA0F4, length: 1048575 > > real0m3.561s > user0m3.535s > sys 0m0.026s > srussell@icelake:~$ time ./cksum_bench_avx2 1048575 1 > H

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
Results thanks to Jeff srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 1 Hash: 5B9DA0F4, length: 1048575 real0m3.561s user0m3.535s sys 0m0.026s srussell@icelake:~$ time ./cksum_bench_avx2 1048575 1 Hash: 5B9DA0F4, length: 1048575 real0m2.083s user0m2.047s sys

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sylvestre Ledru
Hello, Le 25/11/2024 à 17:04, Sam Russell a écrit : I've added a sample benchmarking program to measure the difference without hitting disk, looking like a 40% speedup $ time ./cksum_bench_pclmul 1048576 1 Hash: EFA0B24F, length: 1048576 real0m3.018s user0m3.018s sys 0m0.000s

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
Thanks, sent key off-list I also think I've been confusing myself, the benchmark program doesn't check the flags. I think I will need to change the logic though, here's the lscpu from my Radeon with AVX2 Architecture:x86_64 CPU op-mode(s):32-bit, 64-bit Address sizes:

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Jeffrey Walton
On Mon, Nov 25, 2024 at 11:09 AM Sam Russell wrote: > > I've added a sample benchmarking program to measure the difference without > hitting disk, looking like a 40% speedup > > $ time ./cksum_bench_pclmul 1048576 1 > Hash: EFA0B24F, length: 1048576 > > real0m3.018s > user0m3.018s > sy

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
Actually, looking over https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html shows this ‘icelake-client’ Intel Ice Lake Client CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL, FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MO

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
> Impressive. What CPU was that exactly. AMD Ryzen 5 5600 6-Core Processor > There is a copy/paste issue: > Also `make syntax-check` indicates some lines are > 80 chars. > This improvement should be added to NEWS. Thanks, will fix these > What compiler version are you using? $ gcc -v Using bui

Re: [PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Pádraig Brady
On 25/11/2024 16:04, Sam Russell wrote: I've added a sample benchmarking program to measure the difference without hitting disk, looking like a 40% speedup $ time ./cksum_bench_pclmul 1048576 1 Hash: EFA0B24F, length: 1048576 real0m3.018s user0m3.018s sys 0m0.000s $ time ./cksu

[PATCH] cksum: Use AVX2 and AVX512 for speedup

2024-11-25 Thread Sam Russell
I've added a sample benchmarking program to measure the difference without hitting disk, looking like a 40% speedup $ time ./cksum_bench_pclmul 1048576 1 Hash: EFA0B24F, length: 1048576 real0m3.018s user0m3.018s sys 0m0.000s $ time ./cksum_bench_avx2 1048576 1 Hash: EFA0B24F,