> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Tuesday, May 12, 2020 2:07 AM > To: dev@dpdk.org; jer...@marvell.com; hemant.agra...@nxp.com; Ajit > Khaparde (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com>; > igo...@amazon.com; tho...@monjalon.net; viachesl...@mellanox.com; > arybche...@solarflare.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > Cc: Ruifeng Wang <ruifeng.w...@arm.com>; nd <n...@arm.com> > Subject: [RFC] eal: adjust barriers for IO on Armv8-a > > Change the barrier APIs for IO to reflect that Armv8-a is other-multi-copy > atomicity memory model. > > Armv8-a memory model has been strengthened to require other-multi-copy > atomicity. This property requires memory accesses from an observer to > become visible to all other observers simultaneously [3]. This means > > a) A write arriving at an endpoint shared between multiple CPUs is > visible to all CPUs > b) A write that is visible to all CPUs is also visible to all other > observers in the shareability domain > > This allows for using cheaper DMB instructions in the place of DSB for devices > that are visible to all CPUs (i.e. devices that DPDK caters to). > > Please refer to [1], [2] and [3] for more information. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i > d=22ec71615d824f4f11d38d0e55a88d8956b7e45f > [2] https://www.youtube.com/watch?v=i6DayghhA8Q > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > --- > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h > b/lib/librte_eal/arm/include/rte_atomic_64.h > index 7b7099cdc..e406411bb 100644 > --- a/lib/librte_eal/arm/include/rte_atomic_64.h > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h > @@ -19,11 +19,11 @@ extern "C" { > #include <rte_compat.h> > #include <rte_debug.h> > > -#define rte_mb() asm volatile("dsb sy" : : : "memory") > +#define rte_mb() asm volatile("dmb osh" : : : "memory") > > -#define rte_wmb() asm volatile("dsb st" : : : "memory") > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory") > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory") > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory") > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory") > > @@ -37,9 +37,9 @@ extern "C" { > > #define rte_io_rmb() rte_rmb() > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory") > +#define rte_cio_wmb() rte_wmb() > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory") > +#define rte_cio_rmb() rte_rmb() > > /*------------------------ 128 bit atomic operations > -------------------------*/ > > -- > 2.17.1
This change showed about 7% performance gain in testpmd single core NDR test. Tested-by: Ruifeng Wang <ruifeng.w...@arm.com>