Hi Jerin, Thanks for review, inline comments,
> -----Original Message----- > From: Jerin Jacob <jerinjac...@gmail.com> > Sent: Friday, December 20, 2019 11:38 AM > To: Gavin Hu <gavin...@arm.com> > Cc: dpdk-dev <dev@dpdk.org>; nd <n...@arm.com>; David Marchand > <david.march...@redhat.com>; tho...@monjalon.net; > rasl...@mellanox.com; maxime.coque...@redhat.com; > tiwei....@intel.com; hemant.agra...@nxp.com; jer...@marvell.com; > Pavan Nikhilesh <pbhagavat...@marvell.com>; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong > <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com> > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for > aarch64 > > On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjac...@gmail.com> > wrote: > > > > On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin...@arm.com> wrote: > > > > > > Armv8's peripheral coherence order is a total order on all reads and > writes > > > to that peripheral.[1] > > > > > > The peripheral coherence order for a memory-mapped peripheral > signifies the > > > order in which accesses arrive at the endpoint. For a read or a write > RW1 > > > and a read or a write RW2 to the same peripheral, then RW1 will appear > in > > > the peripheral coherence order for the peripheral before RW2 if either > of > > > the following cases apply: > > > 1. RW1 and RW2 are accesses using Non-cacheable or Device attributes > and > > > RW1 is Ordered-before RW2. > > > 2. RW1 and RW2 are accesses using Device-nGnRE or Device-nGnRnE > attributes > > > and RW1 appears in program order before RW2. > > > > > > This is true if RW1 and RW2 addresses are device memory. i.e the > > registers in the PCI bar address. > > If RW1 is DDR address which is been used by the controller(say NIC > > ring descriptor) then there will be an issue. > > For example Intel i40e driver, the admin queue update in Host DDR > > memory and it updates the doorbell. > > In such a case, this patch will create an issue. Correct? Have you > > checked this patch with ARM64 + XL710 controllers? This patch relaxes the rte_io_*mb barriers for pure PCI device memory accesses. For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB ISH) is not sufficient. But rte_cio_*mb(DMB OSH) is sufficient and can be used. > > > > Some of the legacy code is missing such barriers, that's the reason > > for adding rte_io_* barrier. > > > More details: > > https://dev.dpdk.narkive.com/DpIRqDuy/dpdk-dev-patch-v2-i40e-fix-eth- > i40e-dev-init-sequence-on-thunderx > > > > > > > > > On arm platforms, all the PCI resources are mapped to nGnRE device > memory > > > [2], the above case 2 holds true, that means the peripheral coherence > order > > > applies here and just a compiler barrier is sufficient for rte io > > > barriers. > > > > > > [1] Section B2.3.4 of ARMARM, > https://developer.arm.com/docs/ddi0487/lates > > > t/arm-architecture-reference-manual-armv8-for-armv8-a-architecture- > profile > > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ > > > tree/drivers/pci/pci-sysfs.c#n1204 > > > > > > Signed-off-by: Gavin Hu <gavin...@arm.com> > > > Reviewed-by: Steve Capper <steve.cap...@arm.com> > > > Reviewed-by: Phil Yang <phil.y...@arm.com> > > > --- > > > lib/librte_eal/common/include/arch/arm/rte_atomic_64.h | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h > b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h > > > index 859ae12..fd63956 100644 > > > --- a/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h > > > +++ b/lib/librte_eal/common/include/arch/arm/rte_atomic_64.h > > > @@ -34,11 +34,11 @@ extern "C" { > > > > > > #define rte_smp_rmb() dmb(ishld) > > > > > > -#define rte_io_mb() rte_mb() > > > +#define rte_io_mb() rte_compiler_barrier() > > > > > > -#define rte_io_wmb() rte_wmb() > > > +#define rte_io_wmb() rte_compiler_barrier() > > > > > > -#define rte_io_rmb() rte_rmb() > > > +#define rte_io_rmb() rte_compiler_barrier() > > > > > > #define rte_cio_wmb() dmb(oshst) > > > > > > -- > > > 2.7.4 > > >