Hi, Morten > -----邮件原件----- > 发件人: Morten Brørup <m...@smartsharesystems.com> > 发送时间: Thursday, February 9, 2023 5:34 PM > 收件人: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com>; > yuying.zh...@intel.com; beilei.x...@intel.com; olivier.m...@6wind.com; > andrew.rybche...@oktetlabs.ru; bruce.richard...@intel.com; > konstantin.anan...@huawei.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; Feifei Wang <feifei.wa...@arm.com> > 主题: RE: [PATCH 1/2] net/i40e: replace put function > > > From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com] > > Sent: Thursday, 9 February 2023 07.25 > > > > Integrated zero-copy put API in mempool cache in i40e PMD. > > On Ampere Altra server, l3fwd single core's performance improves by 5% > > with the new API > > > > Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > Reviewed-by: Feifei Wang <feifei.wa...@arm.com> > > --- > > Link: > > https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1- > > m...@smartsharesystems.com/ > > > > .mailmap | 1 + > > drivers/net/i40e/i40e_rxtx_vec_common.h | 34 > > ++++++++++++++++++++----- > > 2 files changed, 28 insertions(+), 7 deletions(-) > > > > diff --git a/.mailmap b/.mailmap > > index 75884b6fe2..05a42edbcf 100644 > > --- a/.mailmap > > +++ b/.mailmap > > @@ -670,6 +670,7 @@ Kai Ji <kai...@intel.com> Kaiwen Deng > > <kaiwenx.d...@intel.com> Kalesh AP > > <kalesh-anakkur.pura...@broadcom.com> > > Kamalakannan R <kamalakanna...@intel.com> > > +Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> > > Kamil Bednarczyk <kamil.bednarc...@intel.com> Kamil Chalupnik > > <kamilx.chalup...@intel.com> Kamil Rytarowski > > <kamil.rytarow...@caviumnetworks.com> > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > > b/drivers/net/i40e/i40e_rxtx_vec_common.h > > index fe1a6ec75e..80d4a159e6 100644 > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > > > n = txq->tx_rs_thresh; > > > > - /* first buffer to free from S/W ring is at index > > - * tx_next_dd - (tx_rs_thresh-1) > > - */ > > + /* first buffer to free from S/W ring is at index > > + * tx_next_dd - (tx_rs_thresh-1) > > + */ > > txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)]; > > > > if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { > > - for (i = 0; i < n; i++) { > > - free[i] = txep[i].mbuf; > > - /* no need to reset txep[i].mbuf in vector path */ > > + struct rte_mempool *mp = txep[0].mbuf->pool; > > + struct rte_mempool_cache *cache = > > rte_mempool_default_cache(mp, rte_lcore_id()); > > + > > + if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) { > > If the mempool has a cache, do not compare n to > RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call > rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for zero- > copy. >
> It looks like this patch behaves incorrectly if the cache is configured to be > smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is 8, > which will make the flush threshold 12. If n is 32, your code will not enter > this > branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which will > return NULL, and then you will goto done. > > Obviously, if there is no cache, fall back to the standard > rte_mempool_put_bulk(). Agree with this. I think we ignore the case that (cache -> flushthresh < n < RTE_MEMPOOL_CACHE_MAX_SIZE). Our goal is that if (!cache || n > cache -> flushthresh), we can put the buffers into mempool directly. Thus maybe we can change as: struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id()); if (!cache || n > cache -> flushthresh) { for (i = 0; i < n ; i++) free[i] = txep[i].mbuf; if (!cache) { rte_mempool_generic_put; goto done; } else if { rte_mempool_ops_enqueue_bulk; goto done; } } If we can change like this? > > > + for (i = 0; i < n ; i++) > > + free[i] = txep[i].mbuf; > > + if (!cache) { > > + rte_mempool_generic_put(mp, (void > **)free, n, > > cache); > > + goto done; > > + } > > + if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) { > > + rte_mempool_ops_enqueue_bulk(mp, (void > **)free, > > n); > > + goto done; > > + } > > + } > > + void **cache_objs; > > + > > + cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, > n); > > + if (cache_objs) { > > + for (i = 0; i < n; i++) { > > + cache_objs[i] = txep->mbuf; > > + /* no need to reset txep[i].mbuf in vector > path > > */ > > + txep++; > > + } > > } > > - rte_mempool_put_bulk(free[0]->pool, (void **)free, n); > > goto done; > > } > > > > -- > > 2.25.1 > >