On Fri, Dec 09, 2016 at 09:30:11AM +0100, Peter Zijlstra wrote: > +static inline u64 mul_u32_u32(u32 a, u32 b) > +{ > + u64 ret; > + > + asm ("mull %[b]" : "=A" (ret) : [a] "a" (a), [b] "g" (b) ); > + > + return ret; > +}
ARGH, that's broken on x86_64, it needs to be: u32 high, low; asm ("mull %[b]" : "=a" (low), "=d" (high) : [a] "a" (a), [b] "g" (b) ); return low | ((u64)high) << 32; The 'A' constraint doesn't work right. And with that all the benchmark results are borken too. root@ivb-ep:~/spinlocks# for i in -m64 -m32 -mx32 ; do echo $i; gcc -O3 $i -o mult mult.c -lm; ./mult; done -m64 cond: avg: 7.474872 +- 0.008302 uncond: avg: 9.116401 +- 0.008468 128: avg: 0.826584 +- 0.005514 -m32 cond: avg: 16.604030 +- 0.009808 uncond: avg: 13.115470 +- 0.004452 -mx32 cond: avg: 6.168156 +- 0.006650 uncond: avg: 7.202092 +- 0.006813 128: avg: 0.081809 +- 0.008440