> From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com] > Sent: Thursday, 9 February 2023 07.25 > > Integrated zero-copy put API in mempool cache in i40e PMD. > On Ampere Altra server, l3fwd single core's performance improves by 5% > with the new API > > Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > Reviewed-by: Feifei Wang <feifei.wa...@arm.com> > --- > Link: > https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1- > m...@smartsharesystems.com/ > > .mailmap | 1 + > drivers/net/i40e/i40e_rxtx_vec_common.h | 34 ++++++++++++++++++++----- > 2 files changed, 28 insertions(+), 7 deletions(-) > > diff --git a/.mailmap b/.mailmap > index 75884b6fe2..05a42edbcf 100644 > --- a/.mailmap > +++ b/.mailmap > @@ -670,6 +670,7 @@ Kai Ji <kai...@intel.com> > Kaiwen Deng <kaiwenx.d...@intel.com> > Kalesh AP <kalesh-anakkur.pura...@broadcom.com> > Kamalakannan R <kamalakanna...@intel.com> > +Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> > Kamil Bednarczyk <kamil.bednarc...@intel.com> > Kamil Chalupnik <kamilx.chalup...@intel.com> > Kamil Rytarowski <kamil.rytarow...@caviumnetworks.com> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > b/drivers/net/i40e/i40e_rxtx_vec_common.h > index fe1a6ec75e..80d4a159e6 100644 > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > n = txq->tx_rs_thresh; > > - /* first buffer to free from S/W ring is at index > - * tx_next_dd - (tx_rs_thresh-1) > - */ > + /* first buffer to free from S/W ring is at index > + * tx_next_dd - (tx_rs_thresh-1) > + */ > txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)]; > > if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { > - for (i = 0; i < n; i++) { > - free[i] = txep[i].mbuf; > - /* no need to reset txep[i].mbuf in vector path */ > + struct rte_mempool *mp = txep[0].mbuf->pool; > + struct rte_mempool_cache *cache = > rte_mempool_default_cache(mp, rte_lcore_id()); > + > + if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
If the mempool has a cache, do not compare n to RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for zero-copy. It looks like this patch behaves incorrectly if the cache is configured to be smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is 8, which will make the flush threshold 12. If n is 32, your code will not enter this branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which will return NULL, and then you will goto done. Obviously, if there is no cache, fall back to the standard rte_mempool_put_bulk(). > + for (i = 0; i < n ; i++) > + free[i] = txep[i].mbuf; > + if (!cache) { > + rte_mempool_generic_put(mp, (void **)free, n, > cache); > + goto done; > + } > + if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) { > + rte_mempool_ops_enqueue_bulk(mp, (void **)free, > n); > + goto done; > + } > + } > + void **cache_objs; > + > + cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n); > + if (cache_objs) { > + for (i = 0; i < n; i++) { > + cache_objs[i] = txep->mbuf; > + /* no need to reset txep[i].mbuf in vector path > */ > + txep++; > + } > } > - rte_mempool_put_bulk(free[0]->pool, (void **)free, n); > goto done; > } > > -- > 2.25.1 >