Hi Paolo, On Mon, Apr 4, 2016 at 4:14 AM, Paolo Bonzini <pbonz...@redhat.com> wrote: > > The issue is that atomic_thread_fence() only affects other atomic > operations, while smp_rmb() and smp_wmb() affect normal loads and stores > as well.
That is different from what I understand what atomic_thread_fence() does. AFAIK, it should order all loads/stores and not only atomic operations. Quoting http://www.inf.pucrs.br/~flash/progeng2/cppreference/w/cpp/atomic/atomic_thread_fencehtml.html: "Establishes memory synchronization ordering of non-atomic and relaxed atomic accesses" > > In the GCC implementation, atomic operations (even relaxed ones) access > memory as if the pointer was volatile. By doing this, GCC can remove > the acquire and release fences altogether on TSO architectures. We > actually observed a case where the compiler subsequently inverted the > order of two writes around a smp_wmb(). That is a GCC bug in that case. GCC should remove smp_wmb() only after the optimization passes have run in the codegen phase. Can you please point me to the mailing list thread/bug where this was discovered? Or is it easily reproducible with reverting that patch? In that case, the location of the bug will be helpful to analyse this further. > In principle it could do the same on architectures that are sequentially > consistent; even if none exists in practice, keeping the barriers for > smp_mb() is consistent with the other barriers. I understand one barrier(), but having two on both sides seems unnecessary. I would prefer we clean up and have just one. Although, I think it is just an annoyance since performance wise there is no difference. Two consecutive barrier()'s should behave just like one. Thanks! -- Pranith