Konstantin, In my understanding, compiler barrier is a kind of software barrier which prevents the compiler from moving memory accesses across the barrier. This should be architecture-independent. And the "sync" instruction is a hardware barrier which depends on PowerPC architecture. So I think the compiler barrier should be the same on x86 and PowerPC. Any comments? Please correct me if I was wrong.
Thanks a lot! Best Regards! ------------------------------ Chao Zhu From: "Ananyev, Konstantin" <konstantin.anan...@intel.com> To: Chao CH Zhu/China/IBM at IBMCN, "dev at dpdk.org" <dev at dpdk.org> Date: 2014/10/16 08:38 Subject: RE: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture Hi, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Chao Zhu > Sent: Friday, September 26, 2014 10:36 AM > To: dev at dpdk.org > Subject: [dpdk-dev] [PATCH 02/12] Add atomic operations for IBM Power architecture > > The atomic operations implemented with assembly code in DPDK only > support x86. This patch add architecture specific atomic operations for > IBM Power architecture. > > Signed-off-by: Chao Zhu <bjzhuc at cn.ibm.com> > --- > .../common/include/powerpc/arch/rte_atomic.h | 387 ++++++++++++++++++++ > .../common/include/powerpc/arch/rte_atomic_arch.h | 318 ++++++++++++++++ > 2 files changed, 705 insertions(+), 0 deletions(-) > create mode 100644 lib/librte_eal/common/include/powerpc/arch/rte_atomic.h > create mode 100644 lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > ... > + > diff --git a/lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > b/lib/librte_eal/common/include/powerpc/arch/rte_atomic_arch.h > new file mode 100644 > index 0000000..fe5666e > --- /dev/null > + ... >+#define rte_arch_rmb() asm volatile("sync" : : : "memory") >+ > +#define rte_arch_compiler_barrier() do { \ > + asm volatile ("" : : : "memory"); \ > +} while(0) I don't know much about PPC architecture, but as I remember it uses a weakly-ordering memory model. Is that correct? If so, then you probably need rte_arch_compiler_barrier() to be "sync" instruction (like mb()s above) . The reason is that IA has much stronger memory ordering model and there are a lot of places in the code where it implies that ordering. For example - ring enqueue/dequeue functions. Konstantin