Hi Dong, > -----Original Message----- > From: Wang Dong [mailto:dong.wang.pro at hotmail.com] > Sent: Thursday, May 07, 2015 4:28 PM > To: Ananyev, Konstantin; dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for > IA processor's rte_wmb/rte_rmb. > > Hi Konstantin, > > > Hi Dong, > > > >> -----Original Message----- > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of WangDong > >> Sent: Tuesday, May 05, 2015 4:38 PM > >> To: dev at dpdk.org > >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for > >> IA processor's rte_wmb/rte_rmb. > >> > >> The current implementation of rte_wmb/rte_rmb for x86 is using processor > >> memory barrier. It's unnessary for IA processor, > compiler > >> memory barrier is enough. > > > > I wouldn't say they are 'unnecessary'. > > There are situations, even on IA, when you need _fence_ isntructions. > > So, please leave rte_*mb() macros unmodified. > OK, leave them unmodified, but I really can't find a situation to use > sfence and lfence instructions.
For example: http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ http://dpdk.org/ml/archives/dev/2014-May/002613.html > > > > I still think that we need to create a new set of architecture dependent > > macros, as what discussed before. > > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them. > > Though if you have some better name in mind, I am open to suggestions here. > What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~ Hmm, but why _dma_? We need same thing for multi-core communication too. If rte_smp_ is not good enough, might be: rte_arch_? > > > > >> But if dpdk runing on a AMD processor, maybe we should use processor > >> memory barrier. > > > > As far as I remember, amd has the same memory ordering model. > It's too hard to find a AMD's software developer manual..... There for example: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf ? Konstantin > > Dong > > > So, I don't think we need #ifdef RTE_ARCH_X86_IA here. > > > > Konstantin > > > >> I add a macro to distinguish them, if we compile DPDK for IA processor, > >> add the macro (RTE_ARCH_X86_IA) can improve > performance > >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using > >> processor memory barrier, in this case, if didn't add > the > >> macro, the memory ordering will not be guaranteed. Which macro is better? > >> If this patch applied, the PMD's old implementation of compiler memory > >> barrier (some volatile variable) can be fixed with > rte_rmb() > >> and rte_wmb() for any architecture. > >> > >> --- > >> lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++ > >> 1 file changed, 10 insertions(+) > >> > >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> index e93e8ee..52b1e81 100644 > >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h > >> @@ -49,10 +49,20 @@ extern "C" { > >> > >> #define rte_mb() _mm_mfence() > >> > >> +#ifdef RTE_ARCH_X86_IA > >> + > >> +#define rte_wmb() rte_compiler_barrier() > >> + > >> +#define rte_rmb() rte_compiler_barrier() > >> + > >> +#else > >> + > >> #define rte_wmb() _mm_sfence() > >> > >> #define rte_rmb() _mm_lfence() > >> > >> +#endif > >> + > >> /*------------------------- 16 bit atomic operations > >> -------------------------*/ > >> > >> #ifndef RTE_FORCE_INTRINSICS > >> -- > >> 1.9.1 > >