On 01/04/2016 22:35, Pranith Kumar wrote:; barrier(); }) > I could not really understand why we need to wrap the fence with > barrier()'s. There are three parts to my confusion. Let me ask one after the > other. > > On x86, __atomic_thread_fence(__ATOMIC_SEQ_CST) will generate an mfence > instruction. On ARM, this will generate the dmb instruction. Both these > serializing instructions also act as compiler barriers. Is there any > architecture which does not generate such a serializing instruction?
(More on this later). >> +#define smp_wmb() ({ barrier(); __atomic_thread_fence(__ATOMIC_RELEASE); >> barrier(); }) >> +#define smp_rmb() ({ barrier(); __atomic_thread_fence(__ATOMIC_ACQUIRE); >> barrier(); }) > > Second, why do you need barrier() on both sides? One barrier() seems to be > sufficient to prevent the compiler from reordering across the macro. Am I > missing something? Yes, that's true. > Finally, I tried looking at the gcc docs but could find nothing regarding > __atomic_thread_fence() not being considered as a memory barrier. What I did > find mentions about it being treated as a function call during the main > optimization stages and not during later stages: > > http://www.spinics.net/lists/gcchelp/msg39798.html > > AFAIU, in these later stages, even adding a barrier() as we are doing will > have no effect. > > Can you point me to any docs which talk more about this? The issue is that atomic_thread_fence() only affects other atomic operations, while smp_rmb() and smp_wmb() affect normal loads and stores as well. In the GCC implementation, atomic operations (even relaxed ones) access memory as if the pointer was volatile. By doing this, GCC can remove the acquire and release fences altogether on TSO architectures. We actually observed a case where the compiler subsequently inverted the order of two writes around a smp_wmb(). It was fixed in commit 3bbf572 ("atomics: add explicit compiler fence in __atomic memory barriers", 2015-06-05). In principle it could do the same on architectures that are sequentially consistent; even if none exists in practice, keeping the barriers for smp_mb() is consistent with the other barriers. Paolo