On Mon, May 23, 2016 at 09:53:00 -0700, Richard Henderson wrote: > On 05/21/2016 01:42 PM, Emilio G. Cota wrote: > >In the process, the atomic_rcu_read/set were converted to implement > >consume/release semantics, respectively. This is inefficient; for > >correctness and maximum performance we only need an smp_barrier_depends > >for reads, and an smp_wmb for writes. Fix it by using the original > >definition of these two primitives for all compilers. > > For what host do you think this is inefficient? > > In particular, what you've done is going to be less efficient for e.g. > armv8, where the __atomic formulation is going to produce load-acquire and > store-release instructions. Whereas the separate barriers are going to > produce two insns. > > As for the common case of x86_64, what you're doing is going to make no > difference at all. > > So what are you trying to improve?
Precisely I tested this on ARMv8. The goal is to not emit a fence at all, i.e. to emit a single store instead of LDR (load-acquire). I just realised that under #ifdef __ATOMIC we have: #define smp_read_barrier_depends() ({ barrier(); __atomic_thread_fence(__ATOMIC_CONSUME); barrier(); }) Why? This should be: #ifdef __alpha__ #define smp_read_barrier_depends() asm volatile("mb":::"memory") #endif unconditionally. My patch should have included this additional change to make sense. Sorry for the confusion. E. PS. And really equating smp_wmb/rmb to release/acquire as we have under #ifdef __ATOMIC is hard to justify, other than to please tsan.