Hi Andrew,

> -----Original Message-----
> From: Andrew Rybchenko <arybche...@solarflare.com>
> Sent: Monday, April 13, 2020 5:52 PM
> To: Gavin Hu <gavin...@arm.com>; dev@dpdk.org
> Cc: nd <n...@arm.com>; david.march...@redhat.com;
> tho...@monjalon.net; rasl...@mellanox.com; d...@linux.vnet.ibm.com;
> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> ma...@mellanox.com; shah...@mellanox.com; viachesl...@mellanox.com;
> jer...@marvell.com; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong
> <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com>
> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
> use it for mlx5 PMD
> 
> On 4/11/20 6:46 AM, Gavin Hu wrote:
> > Hi Andrew,
> >
> >> -----Original Message-----
> >> From: Andrew Rybchenko <arybche...@solarflare.com>
> >> Sent: Saturday, April 11, 2020 1:21 AM
> >> To: Gavin Hu <gavin...@arm.com>; dev@dpdk.org
> >> Cc: nd <n...@arm.com>; david.march...@redhat.com;
> >> tho...@monjalon.net; rasl...@mellanox.com; d...@linux.vnet.ibm.com;
> >> bruce.richard...@intel.com; konstantin.anan...@intel.com;
> >> ma...@mellanox.com; shah...@mellanox.com;
> viachesl...@mellanox.com;
> >> jer...@marvell.com; Honnappa Nagarahalli
> >> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
> >> <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong
> >> <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com>
> >> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class
> and
> >> use it for mlx5 PMD
> >>
> >> On 4/10/20 7:41 PM, Gavin Hu wrote:
> >>> To order writes to various memory types, 'sfence' is required for x86,
> >>> and 'dmb oshst' is required for aarch64.
> >>>
> >>> But within DPDK, there is no abstracted barriers covers this
> >>> combination: sfence(x86)/dmb(aarch64).
> >>>
> >>> So introduce a new barrier class - rte_dma_*mb for this combination,
> >>>
> >>> Doorbell rings are typical use cases of this new barrier class, which
> >>> requires something ready in the memory before letting HW aware.
> >>>
> >>> As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
> >> while
> >>> rte_wmb is 'dsb' for aarch64.
> >>
> >> As far as I can see rte_cio_wmb() is exactly definition of the barrier
> >> to be used for doorbells. Am I missing something?
> >
> > I understand rte_cio_wmb is for DMA buffers, for examples, descriptors,
> work queues, located in the host memory, but shared between CPU and IO
> device.
> > rte_io_wmb is for MMIO regions.
> > We are missing the barriers for various memory types, eg. Doorbell cases.
> 
> When the patch series is applied, we'll have 5 types of memory
> barriers: regular, smp, cio, io, dma. Do we really need so
> many? May be we need a table in description which could
> help to make the right choice. I.e. type of access on both
> axis and type of barrier to use on intersection.
Yes, good suggestion!
Actually Honnappa and I already made a table sheet for this.
Will provide it in next release!
Thanks for your opinions!
> 
> > There is an implication in the definition of rte_cio_wmb, it can not be used
> for non-coherent MMIO region(WC?)
> >
> http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/g
> eneric/rte_atomic.h#L124
> >> May be it is just a bug in rte_cio_wmb() on x86?
> > rte_cio_wmb is ok for doorbells on aarch64, but looking through the
> kernel code, 'sfence' is required for various/mixed memory types.
> > DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences
> of writes to host memory that shared by IO device.
> > Strengthening rte_cio_wmb may hurt performance, so a new barrier class
> is introduced to optimize for aarch64, in the fast path only, while not
> impacting x86.
> >
> http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1
> 087
> 
> May be my problem that I don't fully understand real-life
> usecases when cio should be used in accordance with its
> current definition. Does it make sense without doorbell?
> Does HW polling via DMA?
> 
> Thanks for explanations,
> Andrew.
> 
> >>
> >>> In the joint preliminary testing between Arm and Ampere, 8%~13%
> >>> performance boost was measured.
> >>>
> >>> As there is no functionality changes, it will not impact x86.
> >>>
> >>> Gavin Hu (6):
> >>>    eal: introduce new class of barriers for DMA use cases
> >>>    net/mlx5: dmb for immediate doorbell ring on aarch64
> >>>    net/mlx5: relax barrier to order UAR writes on aarch64
> >>>    net/mlx5: relax barrier for aarch64
> >>>    net/mlx5: add descriptive comment for a barrier
> >>>    doc: clarify one configuration in mlx5 guide
> >>>
> >>> Phil Yang (1):
> >>>    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
> >>>
> >>>   doc/guides/nics/mlx5.rst                    |  6 ++--
> >>>   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
> >>>   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
> >>>   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
> >>>   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
> >>>   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
> >>>   lib/librte_eal/include/generic/rte_atomic.h | 31
> +++++++++++++++++++++
> >>>   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
> >>>   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
> >>>   9 files changed, 78 insertions(+), 15 deletions(-)
> >>>
> >


Reply via email to