On 4/11/20 6:46 AM, Gavin Hu wrote:
> Hi Andrew,
> 
>> -----Original Message-----
>> From: Andrew Rybchenko <arybche...@solarflare.com>
>> Sent: Saturday, April 11, 2020 1:21 AM
>> To: Gavin Hu <gavin...@arm.com>; dev@dpdk.org
>> Cc: nd <n...@arm.com>; david.march...@redhat.com;
>> tho...@monjalon.net; rasl...@mellanox.com; d...@linux.vnet.ibm.com;
>> bruce.richard...@intel.com; konstantin.anan...@intel.com;
>> ma...@mellanox.com; shah...@mellanox.com; viachesl...@mellanox.com;
>> jer...@marvell.com; Honnappa Nagarahalli
>> <honnappa.nagaraha...@arm.com>; Ruifeng Wang
>> <ruifeng.w...@arm.com>; Phil Yang <phil.y...@arm.com>; Joyce Kong
>> <joyce.k...@arm.com>; Steve Capper <steve.cap...@arm.com>
>> Subject: Re: [dpdk-dev] [PATCH RFC v2 0/7] introduce new barrier class and
>> use it for mlx5 PMD
>>
>> On 4/10/20 7:41 PM, Gavin Hu wrote:
>>> To order writes to various memory types, 'sfence' is required for x86,
>>> and 'dmb oshst' is required for aarch64.
>>>
>>> But within DPDK, there is no abstracted barriers covers this
>>> combination: sfence(x86)/dmb(aarch64).
>>>
>>> So introduce a new barrier class - rte_dma_*mb for this combination,
>>>
>>> Doorbell rings are typical use cases of this new barrier class, which
>>> requires something ready in the memory before letting HW aware.
>>>
>>> As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86,
>> while
>>> rte_wmb is 'dsb' for aarch64.
>>
>> As far as I can see rte_cio_wmb() is exactly definition of the barrier
>> to be used for doorbells. Am I missing something?
> 
> I understand rte_cio_wmb is for DMA buffers, for examples, descriptors, work 
> queues, located in the host memory, but shared between CPU and IO device.
> rte_io_wmb is for MMIO regions. 
> We are missing the barriers for various memory types, eg. Doorbell cases.

When the patch series is applied, we'll have 5 types of memory
barriers: regular, smp, cio, io, dma. Do we really need so
many? May be we need a table in description which could
help to make the right choice. I.e. type of access on both
axis and type of barrier to use on intersection.

> There is an implication in the definition of rte_cio_wmb, it can not be used 
> for non-coherent MMIO region(WC?)
> http://code.dpdk.org/dpdk/v20.02/source/lib/librte_eal/common/include/generic/rte_atomic.h#L124
>> May be it is just a bug in rte_cio_wmb() on x86?
> rte_cio_wmb is ok for doorbells on aarch64, but looking through the kernel 
> code, 'sfence' is required for various/mixed memory types.
> DPDK mlx5 PMD uses rte_cio_wmb widely and wisely, it orders sequences of 
> writes to host memory that shared by IO device.
> Strengthening rte_cio_wmb may hurt performance, so a new barrier class is 
> introduced to optimize for aarch64, in the fast path only, while not 
> impacting x86.
> http://code.dpdk.org/dpdk/v20.02/source/drivers/net/mlx5/mlx5_rxtx.c#L1087

May be my problem that I don't fully understand real-life
usecases when cio should be used in accordance with its
current definition. Does it make sense without doorbell?
Does HW polling via DMA?

Thanks for explanations,
Andrew.

>>
>>> In the joint preliminary testing between Arm and Ampere, 8%~13%
>>> performance boost was measured.
>>>
>>> As there is no functionality changes, it will not impact x86.
>>>
>>> Gavin Hu (6):
>>>    eal: introduce new class of barriers for DMA use cases
>>>    net/mlx5: dmb for immediate doorbell ring on aarch64
>>>    net/mlx5: relax barrier to order UAR writes on aarch64
>>>    net/mlx5: relax barrier for aarch64
>>>    net/mlx5: add descriptive comment for a barrier
>>>    doc: clarify one configuration in mlx5 guide
>>>
>>> Phil Yang (1):
>>>    net/mlx5: relax ordering for multi-packet RQ buffer refcnt
>>>
>>>   doc/guides/nics/mlx5.rst                    |  6 ++--
>>>   drivers/net/mlx5/mlx5_rxq.c                 |  2 +-
>>>   drivers/net/mlx5/mlx5_rxtx.c                | 16 ++++++-----
>>>   drivers/net/mlx5/mlx5_rxtx.h                | 14 ++++++----
>>>   lib/librte_eal/arm/include/rte_atomic_32.h  |  6 ++++
>>>   lib/librte_eal/arm/include/rte_atomic_64.h  |  6 ++++
>>>   lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++
>>>   lib/librte_eal/ppc/include/rte_atomic.h     |  6 ++++
>>>   lib/librte_eal/x86/include/rte_atomic.h     |  6 ++++
>>>   9 files changed, 78 insertions(+), 15 deletions(-)
>>>
> 

Reply via email to