mlx5: fix Tx CQ doorbell synchronization on aarch64

Phil Yang (Arm Technology China) Fri, 06 Sep 2019 00:21:01 -0700

Hi, Slava

Thanks for your comments.


> -----Original Message-----
> From: Slava Ovsiienko <[email protected]>
> Sent: Thursday, September 5, 2019 8:12 PM
> To: Phil Yang (Arm Technology China) <[email protected]>;
> [email protected]; Matan Azrad <[email protected]>; Nélio
> Laranjeiro <[email protected]>; [email protected]
> Cc: [email protected]; [email protected]; Honnappa Nagarahalli
> <[email protected]>; Gavin Hu (Arm Technology China)
> <[email protected]>; nd <[email protected]>; [email protected]
> Subject: RE: [PATCH 2/2] net/mlx5: fix Tx CQ doorbell synchronization on
> aarch64
> 
> Hi, Phil
> 
> This point is in datapath and performance is very critical.
> The rte_cio_wmb() may take a lot of CPU cycles, waiting till all previous
> writes become visible for all external (relating to core) agents. 
> The Tx CQE doorbelling does not need any writes to other locations to be 
> completed, 

In my understanding, the PMD needs to wait till all txq fields update is 
completed then ring the doorbell for HW.
Before the Tx CQE doorbelling, it will update the producer index of work queue 
in Tx queue descriptor (at line 2037). 
The compiler barrier cannot guarantee the ordering of these operations. So use 
the explicit HW fence to achieve that.

As same as the HW Tx doorbell in vectorized Tx burst routine, it uses a write 
memory barrier to enforce the register update visible to HW immediately.
Section 32.5.2 in https://doc.dpdk.org/guides/nics/mlx5.html 

> the only concern is not to reorder/merge the writes to the same doorbell 
> register of
> the same sending queue in the tx_burst() internal sending loop/subsequent 
> calls.
> 
> As far as I know - the writes to the same location should not be reordered by 
> any arch
> (may be merged if memory settings allow this, it is not critical for CQE 
> doorbell),
> could you, please, explain why we need explicit hardware fence before CQE 
> doorbell
> update? Do you think doorbell write might be rearranged with previously
> reads from the ring buffer?
> 
> WBR,
> Slava
> 
> > -----Original Message-----
> > From: Phil Yang <[email protected]>
> > Sent: Thursday, September 5, 2019 13:55
> > To: Yongseok Koh <[email protected]>; Slava Ovsiienko
> > <[email protected]>; Matan Azrad <[email protected]>;
> Nélio
> > Laranjeiro <[email protected]>; [email protected]
> > Cc: Thomas Monjalon <[email protected]>; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]
> > Subject: [PATCH 2/2] net/mlx5: fix Tx CQ doorbell synchronization on
> > aarch64
> >
> > For the weaker memory model processors, the compiler barrier is not
> > sufficient to guarantee the coherent memory update be observed by I/O
> > device. It needs the coherent I/O memory barrier to enforce the ordering
> of
> > Tx completion queue doorbell operation.
> >
> > Fixes: da1df1ccabad ("net/mlx5: fix completion queue drain loop")
> > Cc: [email protected]
> >
> > Suggested-by: Gavin Hu <[email protected]>
> > Signed-off-by: Phil Yang <[email protected]>
> > Reviewed-by: Gavin Hu <[email protected]>
> > ---
> >  drivers/net/mlx5/mlx5_rxtx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/mlx5/mlx5_rxtx.c b/drivers/net/mlx5/mlx5_rxtx.c
> > index 4c01187..c11148b 100644
> > --- a/drivers/net/mlx5/mlx5_rxtx.c
> > +++ b/drivers/net/mlx5/mlx5_rxtx.c
> > @@ -2042,7 +2042,7 @@ mlx5_tx_comp_flush(struct mlx5_txq_data
> > *restrict txq,
> >     } else {
> >             return;
> >     }
> > -   rte_compiler_barrier();
> > +   rte_cio_wmb();
> >     *txq->cq_db = rte_cpu_to_be_32(txq->cq_ci);
> >     if (likely(tail != txq->elts_tail)) {
> >             mlx5_tx_free_elts(txq, tail, olx);
> > --
> > 2.7.4

Re: [dpdk-dev] [PATCH 2/2] net/mlx5: fix Tx CQ doorbell synchronization on aarch64

Reply via email to