Hi, Ruifeng

Could we go further and implement loop inside the conditional?
Like this:
if (mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1) {
        for (i = 0; i < n; ++i) {
                void *buf_addr = elts[i]->buf_addr;

                wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
                                              RTE_PKTMBUF_HEADROOM);
                wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
        }
} else {
        for (i = 0; i < n; ++i) {
                void *buf_addr = elts[i]->buf_addr;

                wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
                                              RTE_PKTMBUF_HEADROOM);
        }
}
What do you think?
Also,  we should check the performance on other archs is not affected.

With best regards,
Slava

> -----Original Message-----
> From: Ruifeng Wang <ruifeng.w...@arm.com>
> Sent: Tuesday, June 1, 2021 11:31
> To: Raslan Darawsheh <rasl...@nvidia.com>; Matan Azrad
> <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com>; Slava Ovsiienko
> <viachesl...@nvidia.com>
> Cc: dev@dpdk.org; jer...@marvell.com; n...@arm.com;
> honnappa.nagaraha...@arm.com; Ruifeng Wang <ruifeng.w...@arm.com>
> Subject: [PATCH 2/2] net/mlx5: reduce unnecessary memory access
> 
> MR btree len is a constant during Rx replenish.
> Moved retrieve of the value out of loop to reduce data loads.
> Slight performance uplift was measured on N1SDP.
> 
> Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx_vec.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c
> b/drivers/net/mlx5/mlx5_rxtx_vec.c
> index d5af2d91ff..fc7e2a7f41 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec.c
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
> @@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> *rxq)
>       volatile struct mlx5_wqe_data_seg *wq =
>               &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[elts_idx];
>       unsigned int i;
> +     uint16_t btree_len;
> 
>       if (n >= rxq->rq_repl_thresh) {
>               MLX5_ASSERT(n >=
> MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n));
> @@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> *rxq)
>                       rxq->stats.rx_nombuf += n;
>                       return;
>               }
> +
> +             btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh);
>               for (i = 0; i < n; ++i) {
>                       void *buf_addr;
> 
> @@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
> *rxq)
>                       wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
> 
> RTE_PKTMBUF_HEADROOM);
>                       /* If there's a single MR, no need to replace LKey. */
> -                     if (unlikely(mlx5_mr_btree_len(&rxq-
> >mr_ctrl.cache_bh)
> -                                  > 1))
> +                     if (unlikely(btree_len > 1))
>                               wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
>               }
>               rxq->rq_ci += n;
> --
> 2.25.1

Reply via email to