> -----Original Message----- > From: Slava Ovsiienko <viachesl...@nvidia.com> > Sent: Friday, July 2, 2021 3:06 PM > To: Ruifeng Wang <ruifeng.w...@arm.com>; Raslan Darawsheh > <rasl...@nvidia.com>; Matan Azrad <ma...@nvidia.com>; Shahaf Shuler > <shah...@nvidia.com> > Cc: dev@dpdk.org; jer...@marvell.com; nd <n...@arm.com>; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com> > Subject: RE: [PATCH 2/2] net/mlx5: reduce unnecessary memory access > > Hi, Ruifeng > > Could we go further and implement loop inside the conditional? > Like this: > if (mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1) { > for (i = 0; i < n; ++i) { > void *buf_addr = elts[i]->buf_addr; > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + > RTE_PKTMBUF_HEADROOM); > wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]); > } > } else { > for (i = 0; i < n; ++i) { > void *buf_addr = elts[i]->buf_addr; > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + > RTE_PKTMBUF_HEADROOM); > } > } > What do you think? Agree. Loop inside the conditional should be more efficient.
> Also, we should check the performance on other archs is not affected. I will also test on x86 platform that I have. > > With best regards, > Slava > > > -----Original Message----- > > From: Ruifeng Wang <ruifeng.w...@arm.com> > > Sent: Tuesday, June 1, 2021 11:31 > > To: Raslan Darawsheh <rasl...@nvidia.com>; Matan Azrad > > <ma...@nvidia.com>; Shahaf Shuler <shah...@nvidia.com>; Slava > > Ovsiienko <viachesl...@nvidia.com> > > Cc: dev@dpdk.org; jer...@marvell.com; n...@arm.com; > > honnappa.nagaraha...@arm.com; Ruifeng Wang <ruifeng.w...@arm.com> > > Subject: [PATCH 2/2] net/mlx5: reduce unnecessary memory access > > > > MR btree len is a constant during Rx replenish. > > Moved retrieve of the value out of loop to reduce data loads. > > Slight performance uplift was measured on N1SDP. > > > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > > --- > > drivers/net/mlx5/mlx5_rxtx_vec.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c > > b/drivers/net/mlx5/mlx5_rxtx_vec.c > > index d5af2d91ff..fc7e2a7f41 100644 > > --- a/drivers/net/mlx5/mlx5_rxtx_vec.c > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c > > @@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data > > *rxq) > > volatile struct mlx5_wqe_data_seg *wq = > > &((volatile struct mlx5_wqe_data_seg *)rxq- > >wqes)[elts_idx]; > > unsigned int i; > > + uint16_t btree_len; > > > > if (n >= rxq->rq_repl_thresh) { > > MLX5_ASSERT(n >= > > MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n)); > > @@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct > mlx5_rxq_data > > *rxq) > > rxq->stats.rx_nombuf += n; > > return; > > } > > + > > + btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh); > > for (i = 0; i < n; ++i) { > > void *buf_addr; > > > > @@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct > mlx5_rxq_data > > *rxq) > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr > + > > > > RTE_PKTMBUF_HEADROOM); > > /* If there's a single MR, no need to replace LKey. */ > > - if (unlikely(mlx5_mr_btree_len(&rxq- > > >mr_ctrl.cache_bh) > > - > 1)) > > + if (unlikely(btree_len > 1)) > > wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]); > > } > > rxq->rq_ci += n; > > -- > > 2.25.1