On Mon, May 23, 2016 at 16:21:36 +0200, Paolo Bonzini wrote: > On 21/05/2016 22:42, Emilio G. Cota wrote: > > Commit a0aa44b4 ("include/qemu/atomic.h: default to __atomic functions") > > set all atomics to default (on recent GCC versions) to __atomic primitives. > > > > In the process, the atomic_rcu_read/set were converted to implement > > consume/release semantics, respectively. This is inefficient; for > > correctness and maximum performance we only need an smp_barrier_depends > > for reads, and an smp_wmb for writes. Fix it by using the original > > definition of these two primitives for all compilers. > > Indeed most compilers implement consume the same as acquire, which is > inefficient. > However, isn't in practice atomic_thread_fence(release) + > atomic_store(relaxed) the same as atomic_store(release)?
Yes. However this is not the issue I'm addressing with the patch. The performance regression I measured is due to using load-acquire vs. load+smp_read_barrier_depends(). In the latter case only Alpha will emit a fence; in the former we always emit store-release, which is "stronger" (i.e. more constraining.) A similar thing applies to atomic_rcu_write, although I haven't measured its impact. We only need smp_wmb+store, yet we emit a store-release, which is again "stronger". E.