<snip> > > > > Subject: [RFC] eal: adjust barriers for IO on Armv8-a > > > > > > > > Change the barrier APIs for IO to reflect that Armv8-a is > > > > other-multi-copy atomicity memory model. > > > > > > > > Armv8-a memory model has been strengthened to require > > > > other-multi-copy atomicity. This property requires memory accesses > > > > from an observer to become visible to all other observers > > > > simultaneously [3]. This means > > > > > > > > a) A write arriving at an endpoint shared between multiple CPUs is > > > > visible to all CPUs > > > > b) A write that is visible to all CPUs is also visible to all other > > > > observers in the shareability domain > > > > > > > > This allows for using cheaper DMB instructions in the place of DSB > > > > for devices that are visible to all CPUs (i.e. devices that DPDK caters > > > > to). > > > > > > > > Please refer to [1], [2] and [3] for more information. > > > > > > > > [1] > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > > > > /c ommit/?i d=22ec71615d824f4f11d38d0e55a88d8956b7e45f > > > > [2] https://www.youtube.com/watch?v=i6DayghhA8Q > > > > [3] https://www.cl.cam.ac.uk/~pes20/armv8-mca/ > > > > > > > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > > > > --- > > > > lib/librte_eal/arm/include/rte_atomic_64.h | 10 +++++----- > > > > 1 file changed, 5 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > index 7b7099cdc..e406411bb 100644 > > > > --- a/lib/librte_eal/arm/include/rte_atomic_64.h > > > > +++ b/lib/librte_eal/arm/include/rte_atomic_64.h > > > > @@ -19,11 +19,11 @@ extern "C" { > > > > #include <rte_compat.h> > > > > #include <rte_debug.h> > > > > > > > > -#define rte_mb() asm volatile("dsb sy" : : : "memory") > > > > +#define rte_mb() asm volatile("dmb osh" : : : "memory") > > > > > > > > -#define rte_wmb() asm volatile("dsb st" : : : "memory") > > > > +#define rte_wmb() asm volatile("dmb oshst" : : : "memory") > > > > > > > > -#define rte_rmb() asm volatile("dsb ld" : : : "memory") > > > > +#define rte_rmb() asm volatile("dmb oshld" : : : "memory") > > > > > > > > #define rte_smp_mb() asm volatile("dmb ish" : : : "memory") > > > > > > > > @@ -37,9 +37,9 @@ extern "C" { > > > > > > > > #define rte_io_rmb() rte_rmb() > > > > > > > > -#define rte_cio_wmb() asm volatile("dmb oshst" : : : "memory") > > > > +#define rte_cio_wmb() rte_wmb() > > > > > > > > -#define rte_cio_rmb() asm volatile("dmb oshld" : : : "memory") > > > > +#define rte_cio_rmb() rte_rmb() > > > > > > > > /*------------------------ 128 bit atomic operations > > > > -------------------------*/ > > > > > > > > -- > > > > 2.17.1 > > > > > > This change showed about 7% performance gain in testpmd single core > > NDR test. > > > > I am trying to understand this patch wrt DPDK current usage model? > > > > 1) Is performance improvement due to the fact that the PMD that you > > are using it for testing suppose to use existing rte_cio_* but it was > > using rte_[rw]mb? No, it is supposed to use rte_[rw]mb for x86.
> > This is part of the reason. There are also cases where rte_io_* was used and > can be relaxed. > Such as: http://patches.dpdk.org/patch/68162/ > > > 2) In my understanding : > > a) CPU to CPU barrier requirements are addressed by rte_smp_* > > b) CPU to DMA/Device barrier requirements are addressed by rte_cio_* > > c) CPU to ANY(CPU or Device) are addressed by rte_[rw]mb > > > > If (c) is true then we are violating the DPDK spec with change. Right? No, we are not. Essentially, due to the other-multi-copy atomicity behavior of the architecture, we are saying 'DMB OSH*' is enough to achieve (c). > > Developers are still required to use correct barrier APIs for different use > cases. > I think this change mitigates performance penalty when non optimal barrier is > used. > > > This change will not be required if fastpath (CPU to Device) is using > rte_cio_*. > > Right? Yes. It is required when the fastpath uses rte_[rw]mb. > > See 1). Correct usage of rte_cio_* is not the whole. > For some other use cases, such as barrier between accesses of different > memory types, we can also use lighter barrier 'dmb'. > > > > > > > > > > Tested-by: Ruifeng Wang <ruifeng.w...@arm.com> > > >