FWIW (since all the avx512 discussion seemed to involve intel CPUs), the
new code seems to work fine on AMD zen4 & zen5 in avx512 mode.
Nice find protecting the flag checks, thanks for your help.
Thanks also to Jeff for giving me access to test on a few different
machines with various levels of avx2/avx512 support
On Tue, Nov 26, 2024, 17:19 Pádraig Brady wrote:
> On 26/11/2024 12:19, Sam Russell wrote:
> > > Now I think what
On 26/11/2024 12:19, Sam Russell wrote:
> Now I think what you are saying is there was no SIGILL with the adjusted
cksum,
and that issue was only with the less protected benchmarking code.
Correct, the benchmarking code has zero protections, and the servers I got
SIGILL they were not setting
> Now I think what you are saying is there was no SIGILL with the adjusted
cksum,
and that issue was only with the less protected benchmarking code.
Correct, the benchmarking code has zero protections, and the servers I got
SIGILL they were not setting the VPCLMULQDQ flag so cksum will catch this
I'm comfortable with enabling AVX512 by default. If we can find a CPU that
sets the VPCLMULQDQ flag but doesn't implement the VPCLMULQDQ opcode then
that's probably going to be an issue that affects much more coreutils.
On Tue, 26 Nov 2024 at 12:59, Pádraig Brady wrote:
> On 26/11/2024 07:35, Sa
On 26/11/2024 07:35, Sam Russell wrote:
> However I don't see any changes in CFLAGS or builtin_cpu_supports() checks
> between the first and this patch. Am I missing something?
CFLAGS stayed the same because the compiler output is fine (my PC here doesn't
have AVX512 but it has a recent gcc t
> However I don't see any changes in CFLAGS or builtin_cpu_supports() checks
> between the first and this patch. Am I missing something?
CFLAGS stayed the same because the compiler output is fine (my PC here
doesn't have AVX512 but it has a recent gcc that can build AVX512
instructions). It's poss
On 25/11/2024 23:27, Sam Russell wrote:
The intrinsics guide is a nice find, I dug a bit deeper into the Intel®
Architecture Instruction Set Extensions and Future Features Programming
Reference [1] from March 2018 and it shows the 4 variants:
VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ym
The intrinsics guide is a nice find, I dug a bit deeper into the Intel®
Architecture Instruction Set Extensions and Future Features Programming
Reference [1] from March 2018 and it shows the 4 variants:
VEX.NDS.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
CPUID feature flag: V
On Mon, Nov 25, 2024 at 5:31 PM Sam Russell wrote:
>
> Results thanks to Jeff
>
> srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 1
> Hash: 5B9DA0F4, length: 1048575
>
> real0m3.561s
> user0m3.535s
> sys 0m0.026s
> srussell@icelake:~$ time ./cksum_bench_avx2 1048575 1
> H
Results thanks to Jeff
srussell@icelake:~$ time ./cksum_bench_pclmul 1048575 1
Hash: 5B9DA0F4, length: 1048575
real0m3.561s
user0m3.535s
sys 0m0.026s
srussell@icelake:~$ time ./cksum_bench_avx2 1048575 1
Hash: 5B9DA0F4, length: 1048575
real0m2.083s
user0m2.047s
sys
Hello,
Le 25/11/2024 à 17:04, Sam Russell a écrit :
I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup
$ time ./cksum_bench_pclmul 1048576 1
Hash: EFA0B24F, length: 1048576
real0m3.018s
user0m3.018s
sys 0m0.000s
Thanks, sent key off-list
I also think I've been confusing myself, the benchmark program doesn't
check the flags. I think I will need to change the logic though, here's the
lscpu from my Radeon with AVX2
Architecture:x86_64
CPU op-mode(s):32-bit, 64-bit
Address sizes:
On Mon, Nov 25, 2024 at 11:09 AM Sam Russell wrote:
>
> I've added a sample benchmarking program to measure the difference without
> hitting disk, looking like a 40% speedup
>
> $ time ./cksum_bench_pclmul 1048576 1
> Hash: EFA0B24F, length: 1048576
>
> real0m3.018s
> user0m3.018s
> sy
Actually, looking over https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
shows this
‘icelake-client’
Intel Ice Lake Client CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3,
SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE, PCLMUL,
FSGSBASE, RDRND, F16C, AVX2, BMI, BMI2, LZCNT, FMA, MO
> Impressive. What CPU was that exactly.
AMD Ryzen 5 5600 6-Core Processor
> There is a copy/paste issue:
> Also `make syntax-check` indicates some lines are > 80 chars.
> This improvement should be added to NEWS.
Thanks, will fix these
> What compiler version are you using?
$ gcc -v
Using bui
On 25/11/2024 16:04, Sam Russell wrote:
I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup
$ time ./cksum_bench_pclmul 1048576 1
Hash: EFA0B24F, length: 1048576
real0m3.018s
user0m3.018s
sys 0m0.000s
$ time ./cksu
I've added a sample benchmarking program to measure the difference without
hitting disk, looking like a 40% speedup
$ time ./cksum_bench_pclmul 1048576 1
Hash: EFA0B24F, length: 1048576
real0m3.018s
user0m3.018s
sys 0m0.000s
$ time ./cksum_bench_avx2 1048576 1
Hash: EFA0B24F,
18 matches
Mail list logo