On Mon, May 23, 2016 at 16:21:36 +0200, Paolo Bonzini wrote:
> On 21/05/2016 22:42, Emilio G. Cota wrote:
> > Commit a0aa44b4 ("include/qemu/atomic.h: default to __atomic functions")
> > set all atomics to default (on recent GCC versions) to __atomic primitives.
> > 
> > In the process, the atomic_rcu_read/set were converted to implement
> > consume/release semantics, respectively. This is inefficient; for
> > correctness and maximum performance we only need an smp_barrier_depends
> > for reads, and an smp_wmb for writes. Fix it by using the original
> > definition of these two primitives for all compilers.
> 
> Indeed most compilers implement consume the same as acquire, which is
> inefficient.
> However, isn't in practice atomic_thread_fence(release) +
> atomic_store(relaxed) the same as atomic_store(release)?

Yes. However this is not the issue I'm addressing with the patch.

The performance regression I measured is due to using load-acquire vs.
load+smp_read_barrier_depends(). In the latter case only Alpha will
emit a fence; in the former we always emit store-release, which
is "stronger" (i.e. more constraining.)

A similar thing applies to atomic_rcu_write, although I haven't
measured its impact. We only need smp_wmb+store, yet we emit a
store-release, which is again "stronger".

                E.

Reply via email to