On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <ruifeng.w...@arm.com> wrote: > > > > -----Original Message----- > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > Sent: Tuesday, May 12, 2020 2:07 AM > > To: dev@dpdk.org; jer...@marvell.com; hemant.agra...@nxp.com; Ajit > > Khaparde (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com>; > > igo...@amazon.com; tho...@monjalon.net; viachesl...@mellanox.com; > > arybche...@solarflare.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com> > > Cc: Ruifeng Wang <ruifeng.w...@arm.com>; nd <n...@arm.com> > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a > > > > Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy > > atomicity memory model. > > > > Armv8-a memory model has been strengthened to require other-multi-copy > > atomicity. This property requires memory accesses from an observer to > > become visible to all other observers simultaneously [3]. This means > > > > a) A write arriving at an endpoint shared between multiple CPUs is > > visible to all CPUs > > b) A write that is visible to all CPUs is also visible to all other > > observers in the shareability domain > > > > This allows for using cheaper DMB instructions in the place of DSB for > > devices > > that are visible to all CPUs (i.e. devices that DPDK caters to). > > > > Please refer to [1], [2] and [3] for more information. > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i > > d=22ec71615d824f4f11d38d0e55a88d8956b7e45f > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > --- > > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++----- > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h > > b/lib/librte_eal/arm/include/rte_atomic_64.h > > index 7b7099cdc..e406411bb 100644 > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h > > @@ -19,11 +19,11 @@ extern "C" { > > #include <rte_compat.h> > > #include <rte_debug.h> > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory") > > +#define rte_mb() asm volatile("dmb osh" : : : "memory") > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory") > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory") > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory") > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory") > > > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory") > > > > @@ -37,9 +37,9 @@ extern "C" { > > > > #define rte_io_rmb() rte_rmb() > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory") > > +#define rte_cio_wmb() rte_wmb() > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory") > > +#define rte_cio_rmb() rte_rmb() > > > > /*------------------------ 128 bit atomic operations > > -------------------------*/ > > > > -- > > 2.17.1 > > This change showed about 7% performance gain in testpmd single core NDR test.
I am trying to understand this patch wrt DPDK current usage model? 1) Is performance improvement due to the fact that the PMD that you are using it for testing suppose to use existing rte_cio_* but it was using rte_[rw]mb? 2) In my understanding : a) CPU to CPU barrier requirements are addressed by rte_smp_* b) CPU to DMA/Device barrier requirements are addressed by rte_cio_* c) CPU to ANY(CPU or Device) are addressed by rte_[rw]mb If (c) is true then we are violating the DPDK spec with change. Right? This change will not be required if fastpath (CPU to Device) is using rte_cio_*. Right? > Tested-by: Ruifeng Wang <ruifeng.w...@arm.com> >