Hi, Phil

This point is in datapath and performance is very critical.
The rte_cio_wmb() may take a lot of CPU cycles, waiting till all previous 
writes become
visible for all external (relating to core) agents. The Tx CQE doorbelling does 
not need
any writes to other locations to be completed, the only concern is not to 
reorder/merge
the writes to the same doorbell register of the same sending queue in the 
tx_burst()
internal sending loop/subsequent calls. 

As far as I know - the writes to the same location should not be reordered by 
any arch
(may be merged if memory settings allow this, it is not critical for CQE 
doorbell),
could you, please, explain why we need explicit hardware fence before CQE 
doorbell
update? Do you think doorbell write might be rearranged with previously reads 
from the ring
buffer?

WBR,
Slava

> -----Original Message-----
> From: Phil Yang <phil.y...@arm.com>
> Sent: Thursday, September 5, 2019 13:55
> To: Yongseok Koh <ys...@mellanox.com>; Slava Ovsiienko
> <viachesl...@mellanox.com>; Matan Azrad <ma...@mellanox.com>; NĂ©lio
> Laranjeiro <nelio.laranje...@6wind.com>; dev@dpdk.org
> Cc: Thomas Monjalon <tho...@monjalon.net>; jer...@marvell.com;
> honnappa.nagaraha...@arm.com; gavin...@arm.com; n...@arm.com;
> sta...@dpdk.org
> Subject: [PATCH 2/2] net/mlx5: fix Tx CQ doorbell synchronization on
> aarch64
> 
> For the weaker memory model processors, the compiler barrier is not
> sufficient to guarantee the coherent memory update be observed by I/O
> device. It needs the coherent I/O memory barrier to enforce the ordering of
> Tx completion queue doorbell operation.
> 
> Fixes: da1df1ccabad ("net/mlx5: fix completion queue drain loop")
> Cc: sta...@dpdk.org
> 
> Suggested-by: Gavin Hu <gavin...@arm.com>
> Signed-off-by: Phil Yang <phil.y...@arm.com>
> Reviewed-by: Gavin Hu <gavin...@arm.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> index 4c01187..c11148b 100644
> --- a/drivers/net/mlx5/mlx5_rxtx.c
> +++ b/drivers/net/mlx5/mlx5_rxtx.c
> @@ -2042,7 +2042,7 @@ mlx5_tx_comp_flush(struct mlx5_txq_data
> *restrict txq,
>       } else {
>               return;
>       }
> -     rte_compiler_barrier();
> +     rte_cio_wmb();
>       *txq->cq_db = rte_cpu_to_be_32(txq->cq_ci);
>       if (likely(tail != txq->elts_tail)) {
>               mlx5_tx_free_elts(txq, tail, olx);
> --
> 2.7.4

Reply via email to