Hi Alex, I have one question inline below.
Alex Bennée writes: > The __atomic primitives have been available since GCC 4.7 and provide > a richer interface for describing memory ordering requirements. As a > bonus by using the primitives instead of hand-rolled functions we can > use tools such as the AddressSanitizer which need the use of well > defined APIs for its analysis. > > If we have __ATOMIC defines we exclusively use the __atomic primitives > for all our atomic access. Otherwise we fall back to the mixture of > __sync and hand-rolled barrier cases. > > +/* For C11 atomic ops */ > + > +/* Manual memory barriers > + * > + *__atomic_thread_fence does not include a compiler barrier; instead, > + * the barrier is part of __atomic_load/__atomic_store's "volatile-like" > + * semantics. If smp_wmb() is a no-op, absence of the barrier means that > + * the compiler is free to reorder stores on each side of the barrier. > + * Add one here, and similarly in smp_rmb() and smp_read_barrier_depends(). > + */ > + > +#define smp_mb() ({ barrier(); __atomic_thread_fence(__ATOMIC_SEQ_CST); > barrier(); }) I could not really understand why we need to wrap the fence with barrier()'s. There are three parts to my confusion. Let me ask one after the other. First, these primitives are used in qemu codebase which runs on the host architecture. Let us consider two example architectures: x86 and ARM. On x86, __atomic_thread_fence(__ATOMIC_SEQ_CST) will generate an mfence instruction. On ARM, this will generate the dmb instruction. Both these serializing instructions also act as compiler barriers. Is there any architecture which does not generate such a serializing instruction? > +#define smp_wmb() ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); > barrier(); }) > +#define smp_rmb() ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); > barrier(); }) Second, why do you need barrier() on both sides? One barrier() seems to be sufficient to prevent the compiler from reordering across the macro. Am I missing something? Finally, I tried looking at the gcc docs but could find nothing regarding __atomic_thread_fence() not being considered as a memory barrier. What I did find mentions about it being treated as a function call during the main optimization stages and not during later stages: http://www.spinics.net/lists/gcchelp/msg39798.html AFAIU, in these later stages, even adding a barrier() as we are doing will have no effect. Can you point me to any docs which talk more about this? Thanks! -- Pranith