To order writes to various memory types, 'sfence' is required for x86, and 'dmb oshst' is required for aarch64.
But within DPDK, there is no abstracted barriers covers this combination: sfence(x86)/dmb(aarch64). So introduce a new barrier class - rte_dma_*mb for this combination, Doorbell rings are typical use cases of this new barrier class, which requires something ready in the memory before letting HW aware. As a note, rte_io_wmb and rte_cio_wmb are compiler barriers for x86, while rte_wmb is 'dsb' for aarch64. In the joint preliminary testing between Arm and Ampere, 8%~13% performance boost was measured. As there is no functionality changes, it will not impact x86. Gavin Hu (6): eal: introduce new class of barriers for DMA use cases net/mlx5: dmb for immediate doorbell ring on aarch64 net/mlx5: relax barrier to order UAR writes on aarch64 net/mlx5: relax barrier for aarch64 net/mlx5: add descriptive comment for a barrier doc: clarify one configuration in mlx5 guide Phil Yang (1): net/mlx5: relax ordering for multi-packet RQ buffer refcnt doc/guides/nics/mlx5.rst | 6 ++-- drivers/net/mlx5/mlx5_rxq.c | 2 +- drivers/net/mlx5/mlx5_rxtx.c | 16 ++++++----- drivers/net/mlx5/mlx5_rxtx.h | 14 ++++++---- lib/librte_eal/arm/include/rte_atomic_32.h | 6 ++++ lib/librte_eal/arm/include/rte_atomic_64.h | 6 ++++ lib/librte_eal/include/generic/rte_atomic.h | 31 +++++++++++++++++++++ lib/librte_eal/ppc/include/rte_atomic.h | 6 ++++ lib/librte_eal/x86/include/rte_atomic.h | 6 ++++ 9 files changed, 78 insertions(+), 15 deletions(-) -- 2.17.1