[dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

Ananyev, Konstantin Thu, 7 May 2015 16:34:01 +0000

Hi Dong,

> -----Original Message-----
> From: Wang Dong [mailto:dong.wang.pro at hotmail.com]
> Sent: Thursday, May 07, 2015 4:28 PM
> To: Ananyev, Konstantin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for 
> IA processor's rte_wmb/rte_rmb.
> 
> Hi Konstantin,
> 
> > Hi Dong,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of WangDong
> >> Sent: Tuesday, May 05, 2015 4:38 PM
> >> To: dev at dpdk.org
> >> Subject: [dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for 
> >> IA processor's rte_wmb/rte_rmb.
> >>
> >> The current implementation of rte_wmb/rte_rmb for x86 is using processor 
> >> memory barrier. It's unnessary for IA processor,
> compiler
> >> memory barrier is enough.
> >
> > I wouldn't say they are 'unnecessary'.
> > There are situations, even on IA, when you need _fence_ isntructions.
> > So, please leave rte_*mb() macros unmodified.
> OK, leave them unmodified, but I really can't find a situation to use
> sfence and lfence instructions.


For example:
http://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/
http://dpdk.org/ml/archives/dev/2014-May/002613.html

> 
> 
> > I still think that we need to create a new set of architecture dependent 
> > macros, as what discussed before.
> > Probably by analogy with linux kernel rte_smp_*mb() is a good name for them.
> > Though if you have some better name in mind, I am open to suggestions here.
> What abount rte_dma_*mb()? I find dma_*mb() in linux-4.0.1, it looks good~~

Hmm, but why _dma_?
We need same thing for multi-core communication too.
If rte_smp_ is not good enough, might be: rte_arch_?

> 
> >
> >> But if dpdk runing on a AMD processor, maybe we should use processor 
> >> memory barrier.
> >
> > As far as I remember, amd has the same memory ordering model.
> It's too hard to find a AMD's software developer manual.....

There for example:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/24593_APM_v21.pdf
?

Konstantin

> 
> Dong
> 
> > So, I don't think we need  #ifdef RTE_ARCH_X86_IA here.
> >
> > Konstantin
> >
> >> I add a macro to distinguish them, if we compile DPDK for IA processor, 
> >> add the macro (RTE_ARCH_X86_IA) can improve
> performance
> >> with compiler memory barrier. Or we can add RTE_ARCH_X86_AMD for using 
> >> processor memory barrier, in this case, if didn't add
> the
> >> macro, the memory ordering will not be guaranteed. Which macro is better?
> >> If this patch applied, the PMD's old implementation of compiler memory 
> >> barrier (some volatile variable) can be fixed with
> rte_rmb()
> >> and rte_wmb() for any architecture.
> >>
> >> ---
> >>   lib/librte_eal/common/include/arch/x86/rte_atomic.h | 10 ++++++++++
> >>   1 file changed, 10 insertions(+)
> >>
> >> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h 
> >> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> index e93e8ee..52b1e81 100644
> >> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
> >> @@ -49,10 +49,20 @@ extern "C" {
> >>
> >>   #define  rte_mb() _mm_mfence()
> >>
> >> +#ifdef RTE_ARCH_X86_IA
> >> +
> >> +#define rte_wmb() rte_compiler_barrier()
> >> +
> >> +#define rte_rmb() rte_compiler_barrier()
> >> +
> >> +#else
> >> +
> >>   #define  rte_wmb() _mm_sfence()
> >>
> >>   #define  rte_rmb() _mm_lfence()
> >>
> >> +#endif
> >> +
> >>   /*------------------------- 16 bit atomic operations 
> >> -------------------------*/
> >>
> >>   #ifndef RTE_FORCE_INTRINSICS
> >> --
> >> 1.9.1
> >

[dpdk-dev] [PATCH] librte_eal:Using compiler memory barrier for IA processor's rte_wmb/rte_rmb.

Reply via email to