Hi Andrew, > -----Original Message----- > From: Andrew Rybchenko <arybche...@solarflare.com> > Sent: Saturday, April 11, 2020 1:21 AM > To: Gavin Hu <gavin...@arm.com>; dev@dpdk.org > Cc: nd <n...@arm.com>; david.march...@redhat.com; > tho...@monjalon.net; rasl...@mellanox.com; d...@linux.vnet.ibm.com; > bruce.richard...@intel.com; konstantin.anan...@intel.com; > ma...@mellanox.com; shah...@mellanox.com; viachesl...@mellanox.com; > jer...@marvell.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong > <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com> > Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and > use it for mlx5 PMD > > On 4/10/20 7:41 PM, Gavin Hu wrote: > > To order writes to various memory types, 'sfence' is required for x86, > > and 'dmb oshst' is required for aarch64. > > > > But within DPDK, there is no abstracted barriers covers this > > combination: sfence(x86)/dmb(aarch64). > > > > So introduce a new barrier class - rte_dma_*mb for this combination, > > > > Doorbell rings are typical use cases of this new barrier class, which > > requires something ready in the memory before letting HW aware. > > > > As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86, > while > > rte_wmb is 'dsb' for aarch64. > > As far as I can see rte_cio_wmb() is exactly definition of the barrier > to be used for doorbells. Am I missing something?
I understand rte_cio_wmb is for DMA buffers, for examples, descriptors, work queues, located in the host memory, but shared between CPU and IO device. rte_io_wmb is for MMIO regions. We are missing the barriers for various memory types, eg. Doorbell cases. There is an implication in the definition of rte_cio_wmb, it can not be used for non-coherent MMIO region(WC?) http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/generic/rte_atomic.h#L124 > May be it is just a bug in rte_cio_wmb() on x86? rte_cio_wmb is ok for doorbells on aarch64, but looking through the kernel code, 'sfence' is required for various/mixed memory types. DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences of writes to host memory that shared by IO device. Strengthening rte_cio_wmb may hurt performance, so a new barrier class is introduced to optimize for aarch64, in the fast path only, while not impacting x86. http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1087 /Gavin > > > In the joint preliminary testing between Arm and Ampere, 8%~13% > > performance boost was measured. > > > > As there is no functionality changes, it will not impact x86. > > > > Gavin Hu (6): > > eal: introduce new class of barriers for DMA use cases > > net/mlx5: dmb for immediate doorbell ring on aarch64 > > net/mlx5: relax barrier to order UAR writes on aarch64 > > net/mlx5: relax barrier for aarch64 > > net/mlx5: add descriptive comment for a barrier > > doc: clarify one configuration in mlx5 guide > > > > Phil Yang (1): > > net/mlx5: relax ordering for multi-packet RQ buffer refcnt > > > > doc/guides/nics/mlx5.rst | 6 ++-- > > drivers/net/mlx5/mlx5_rxq.c | 2 +- > > drivers/net/mlx5/mlx5_rxtx.c | 16 ++++++----- > > drivers/net/mlx5/mlx5_rxtx.h | 14 ++++++---- > > lib/librte_eal/arm/include/rte_atomic_32.h | 6 ++++ > > lib/librte_eal/arm/include/rte_atomic_64.h | 6 ++++ > > lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++ > > lib/librte_eal/ppc/include/rte_atomic.h | 6 ++++ > > lib/librte_eal/x86/include/rte_atomic.h | 6 ++++ > > 9 files changed, 78 insertions(+), 15 deletions(-) > >