Hi Jerin,

> -----Original Message-----
> From: Jerin Jacob <jerinjac...@gmail.com>
> Sent: Friday, December 20, 2019 12:34 PM
> To: Gavin Hu <gavin...@arm.com>
> Cc: dpdk-dev <dev@dpdk.org>; nd <n...@arm.com>; David Marchand
> <david.march...@redhat.com>; tho...@monjalon.net;
> rasl...@mellanox.com; maxime.coque...@redhat.com;
> tiwei....@intel.com; hemant.agra...@nxp.com; jer...@marvell.com;
> Pavan Nikhilesh <pbhagavat...@marvell.com>; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong
> <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com>
> Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier for
> aarch64
> 
> On Fri, Dec 20, 2019 at 9:49 AM Gavin Hu <gavin...@arm.com> wrote:
> >
> > Hi Jerin,
> >
> > Thanks for review, inline comments,
> >
> > > -----Original Message-----
> > > From: Jerin Jacob <jerinjac...@gmail.com>
> > > Sent: Friday, December 20, 2019 11:38 AM
> > > To: Gavin Hu <gavin...@arm.com>
> > > Cc: dpdk-dev <dev@dpdk.org>; nd <n...@arm.com>; David Marchand
> > > <david.march...@redhat.com>; tho...@monjalon.net;
> > > rasl...@mellanox.com; maxime.coque...@redhat.com;
> > > tiwei....@intel.com; hemant.agra...@nxp.com; jer...@marvell.com;
> > > Pavan Nikhilesh <pbhagavat...@marvell.com>; Honnappa Nagarahalli
> > > <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> > > <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce
> Kong
> > > <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com>
> > > Subject: Re: [dpdk-dev] [PATCH v2 1/3] eal/arm64: relax the io barrier
> for
> > > aarch64
> > >
> > > On Fri, Dec 20, 2019 at 9:03 AM Jerin Jacob <jerinjac...@gmail.com>
> > > wrote:
> > > >
> > > > On Fri, Dec 20, 2019 at 8:40 AM Gavin Hu <gavin...@arm.com> wrote:
> > > > >
> > > > > Armv8's peripheral coherence order is a total order on all reads and
> > > writes
> > > > > to that peripheral.[1]
> > > > >
> > > > > The peripheral coherence order for a memory-mapped peripheral
> > > signifies the
> > > > > order in which accesses arrive at the endpoint.  For a read or a write
> > > RW1
> > > > > and a read or a write RW2 to the same peripheral, then RW1 will
> appear
> > > in
> > > > > the peripheral coherence order for the peripheral before RW2 if
> either
> > > of
> > > > > the following cases apply:
> > > > >  1. RW1 and RW2 are accesses using Non-cacheable or Device
> attributes
> > > and
> > > > >     RW1 is Ordered-before RW2.
> > > > >  2. RW1 and RW2 are accesses using Device-nGnRE or Device-
> nGnRnE
> > > attributes
> > > > >     and RW1 appears in program order before RW2.
> > > >
> > > >
> > > > This is true if RW1 and RW2 addresses are device memory. i.e the
> > > > registers in the  PCI bar address.
> > > > If RW1 is DDR address which is been used by the controller(say NIC
> > > > ring descriptor) then there will be an issue.
> > > > For example Intel i40e driver, the admin queue update in Host DDR
> > > > memory and it updates the doorbell.
> > > > In such a case, this patch will create an issue. Correct? Have you
> > > > checked this patch with ARM64 + XL710 controllers?
> >
> > This patch relaxes the rte_io_*mb barriers for pure PCI device memory
> accesses.
> 
> Yes. This would break cases for mixed access fro i40e drivers.
> 
> >
> > For mixed accesses of DDR and PCI device memory, rte_smp_*mb(DMB
> ISH) is not sufficient.
> > But rte_cio_*mb(DMB OSH) is sufficient and can be used.
> 
> Yes. Let me share a bit of history.
> 
> 1) There are a lot of drivers(initially developed in x86) that have
> mixed access and don't have any barriers as x86 does not need it.
> 2) rte_io introduced to fix that
> 3) Item (2) introduced the performance issues in the fast path as an
> optimization rte_cio_* introduced.
Exactly, this patch is to mitigate the performance issues introduced by 
rte_io('dsb' is too much and unnecessary here).
Rte_cio instead is definitely required for mixed access. 
> 
> So in the current of the scheme of things, we have APIs to FIX
> portability issue(rte_io) and performance issue(rte_cio).
> IMO, we may not need any change in infra code now. If you think, the
> documentation is missing then we can enhance it.
> If we make infra change then again drivers needs to be updated and tested.
No changes for rte_cio, the semantics, and definitions of rte_io does not 
change either, if limited the scope to PCI, which is the case in DPDK 
context(?).
The change lies only in the implementation, right? 

Just looked at the link you shared and found i40 driver is missing rte_cio_*mb 
in i40e_asq_send_command, but the old rte_io_*mb rescued. 
Will submit a new patch in this series to used rte_cio together with new 
relaxed rte_io and do more tests. 

Yes, this is a big change, also a big optimization, for aarch64, in our tests 
it has very positive results.
But as the case in i40e, we must pay attention to where rte_cio was missing but 
rescued by old rte_io(but not by new rte_io). 


Reply via email to