On Tue, May 12, 2020 at 1:32 PM Ruifeng Wang <ruifeng.w...@arm.com> wrote: > > > > -----Original Message----- > > From: Jerin Jacob <jerinjac...@gmail.com> > > Sent: Tuesday, May 12, 2020 2:42 PM > > To: Ruifeng Wang <ruifeng.w...@arm.com> > > Cc: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > > dev@dpdk.org; jer...@marvell.com; hemant.agra...@nxp.com; Ajit > > Khaparde (ajit.khapa...@broadcom.com) <ajit.khapa...@broadcom.com>; > > igo...@amazon.com; tho...@monjalon.net; viachesl...@mellanox.com; > > arybche...@solarflare.com; nd <n...@arm.com> > > Subject: Re: [dpdk-dev] [RFC] eal: adjust barriers for IO on Armv8-a > > > > On Tue, May 12, 2020 at 11:48 AM Ruifeng Wang <ruifeng.w...@arm.com> > > wrote: > > > > > > > > > > -----Original Message----- > > > > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > Sent: Tuesday, May 12, 2020 2:07 AM > > > > To: dev@dpdk.org; jer...@marvell.com; hemant.agra...@nxp.com; Ajit > > > > Khaparde (ajit.khapa...@broadcom.com) > > <ajit.khapa...@broadcom.com>; > > > > igo...@amazon.com; tho...@monjalon.net; > > viachesl...@mellanox.com; > > > > arybche...@solarflare.com; Honnappa Nagarahalli > > > > <honnappa.nagaraha...@arm.com> > > > > Cc: Ruifeng Wang <ruifeng.w...@arm.com>; nd <n...@arm.com> > > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a > > > > > > > > Change the barrier APIs for IO to reflect that Armv8-a is > > > > other-multi-copy atomicity memory model. > > > > > > > > Armv8-a memory model has been strengthened to require > > > > other-multi-copy atomicity. This property requires memory accesses > > > > from an observer to become visible to all other observers > > > > simultaneously [3]. This means > > > > > > > > a) A write arriving at an endpoint shared between multiple CPUs is > > > > visible to all CPUs > > > > b) A write that is visible to all CPUs is also visible to all other > > > > observers in the shareability domain > > > > > > > > This allows for using cheaper DMB instructions in the place of DSB > > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters > > > > to). > > > > > > > > Please refer to [1], [2] and [3] for more information. > > > > > > > > [1] > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c > > > > ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f > > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q > > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ > > > > > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > --- > > > > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > index 7b7099cdc..e406411bb 100644 > > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > @@ -19,11 +19,11 @@ extern "C" { > > > > #include <rte_compat.h> > > > > #include <rte_debug.h> > > > > > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory") > > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory") > > > > > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory") > > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory") > > > > > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory") > > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory") > > > > > > > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory") > > > > > > > > @@ -37,9 +37,9 @@ extern "C" { > > > > > > > > #define rte_io_rmb() rte_rmb() > > > > > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory") > > > > +#define rte_cio_wmb() rte_wmb() > > > > > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory") > > > > +#define rte_cio_rmb() rte_rmb() > > > > > > > > /*------------------------ 128 bit atomic operations > > > > -------------------------*/ > > > > > > > > -- > > > > 2.17.1 > > > > > > This change showed about 7% performance gain in testpmd single core > > NDR test. > > > > I am trying to understand this patch wrt DPDK current usage model? > > > > 1) Is performance improvement due to the fact that the PMD that you are > > using it for testing suppose to use existing rte_cio_* but it was using > > rte_[rw]mb? > > This is part of the reason. There are also cases where rte_io_* was used and > can be relaxed. > Such as: http://patches.dpdk.org/patch/68162/ > > > 2) In my understanding : > > a) CPU to CPU barrier requirements are addressed by rte_smp_* > > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_* > > c) CPU to ANY(CPU or Device) are addressed by rte_[rw]mb > > > > If (c) is true then we are violating the DPDK spec with change. Right? > > Developers are still required to use correct barrier APIs for different use > cases. > I think this change mitigates performance penalty when non optimal barrier is > used.
But does it violate the contract? We are using rte_[rw]mb as a low performance/heavyweight for all the cases. I think that is the contract to DPDK consumers. For different requirment, We have a specific API. IMO, It makes sense to change the fastpath code for more fine granted barriers based on the need rather than changing the generic one to lightweight. i.e rte_[rw]wb is the superset that works on all cases and use customized one for the specific use case. > > > This change will not be required if fastpath (CPU to Device) is using > > rte_cio_*. > > Right? > > See 1). Correct usage of rte_cio_* is not the whole. > For some other use cases, such as barrier between accesses of different > memory types, we can also use lighter barrier 'dmb'. > > > > > > > > > > Tested-by: Ruifeng Wang <ruifeng.w...@arm.com> > > >