> -----邮件原件----- > 发件人: Morten Brørup <m...@smartsharesystems.com> > 发送时间: Thursday, February 9, 2023 7:31 PM > 收件人: Feifei Wang <feifei.wa...@arm.com>; Kamalakshitha Aligeri > <kamalakshitha.alig...@arm.com>; yuying.zh...@intel.com; > beilei.x...@intel.com; olivier.m...@6wind.com; > andrew.rybche...@oktetlabs.ru; bruce.richard...@intel.com; > konstantin.anan...@huawei.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > 抄送: dev@dpdk.org; nd <n...@arm.com>; Ruifeng Wang > <ruifeng.w...@arm.com>; nd <n...@arm.com> > 主题: RE: [PATCH 1/2] net/i40e: replace put function > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > Sent: Thursday, 9 February 2023 11.59 > > > > Hi, Morten > > > > > 发件人: Morten Brørup <m...@smartsharesystems.com> > > > 发送时间: Thursday, February 9, 2023 5:34 PM > > > > > > > From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com] > > > > Sent: Thursday, 9 February 2023 07.25 > > > > > > > > Integrated zero-copy put API in mempool cache in i40e PMD. > > > > On Ampere Altra server, l3fwd single core's performance improves > > > > by > > 5% > > > > with the new API > > > > > > > > Signed-off-by: Kamalakshitha Aligeri > > <kamalakshitha.alig...@arm.com> > > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > > > > Reviewed-by: Feifei Wang <feifei.wa...@arm.com> > > > > --- > > > > Link: > > > > > https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887 > > > > - > > 1- > > > > m...@smartsharesystems.com/ > > > > > > > > .mailmap | 1 + > > > > drivers/net/i40e/i40e_rxtx_vec_common.h | 34 > > > > ++++++++++++++++++++----- > > > > 2 files changed, 28 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/.mailmap b/.mailmap > > > > index 75884b6fe2..05a42edbcf 100644 > > > > --- a/.mailmap > > > > +++ b/.mailmap > > > > @@ -670,6 +670,7 @@ Kai Ji <kai...@intel.com> Kaiwen Deng > > > > <kaiwenx.d...@intel.com> Kalesh AP > > > > <kalesh-anakkur.pura...@broadcom.com> > > > > Kamalakannan R <kamalakanna...@intel.com> > > > > +Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> > > > > Kamil Bednarczyk <kamil.bednarc...@intel.com> Kamil Chalupnik > > > > <kamilx.chalup...@intel.com> Kamil Rytarowski > > > > <kamil.rytarow...@caviumnetworks.com> > > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > b/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > index fe1a6ec75e..80d4a159e6 100644 > > > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > > > > > > > n = txq->tx_rs_thresh; > > > > > > > > - /* first buffer to free from S/W ring is at index > > > > - * tx_next_dd - (tx_rs_thresh-1) > > > > - */ > > > > + /* first buffer to free from S/W ring is at index > > > > + * tx_next_dd - (tx_rs_thresh-1) > > > > + */ > > > > txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)]; > > > > > > > > if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { > > > > - for (i = 0; i < n; i++) { > > > > - free[i] = txep[i].mbuf; > > > > - /* no need to reset txep[i].mbuf in vector path > > > > */ > > > > + struct rte_mempool *mp = txep[0].mbuf->pool; > > > > + struct rte_mempool_cache *cache = > > > > rte_mempool_default_cache(mp, rte_lcore_id()); > > > > + > > > > + if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) { > > > > > > If the mempool has a cache, do not compare n to > > > RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call > > > rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for > > zero- > > > copy. > > > > > > > > It looks like this patch behaves incorrectly if the cache is > > configured to be > > > smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size > is > > 8, > > > which will make the flush threshold 12. If n is 32, your code will > > not enter this > > > branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which > > will > > > return NULL, and then you will goto done. > > > > > > Obviously, if there is no cache, fall back to the standard > > > rte_mempool_put_bulk(). > > > > Agree with this. I think we ignore the case that (cache -> flushthresh > > < n < RTE_MEMPOOL_CACHE_MAX_SIZE). > > > > Our goal is that if (!cache || n > cache -> flushthresh), we can put > > the buffers into mempool directly. > > > > Thus maybe we can change as: > > struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, > > rte_lcore_id()); if (!cache || n > cache -> flushthresh) { > > for (i = 0; i < n ; i++) > > free[i] = txep[i].mbuf; > > if (!cache) { > > rte_mempool_generic_put; > > goto done; > > } else if { > > rte_mempool_ops_enqueue_bulk; > > goto done; > > } > > } > > > > If we can change like this? > > Since I consider "flushthreshold" private to the cache structure, it shouldn't > be accessed directly. If its meaning changes in the future, you will have to > rewrite the PMD code again. Ok, Agree with it. Using private variable to the cache needs to be treated with caution.
Use the mempool API instead of accessing the > mempool structures directly. (Yeah, I know the mempool and mempool > cache structures are not marked as internal, and thus formally public, but I > still dislike accessing their internals from outside the mempool library.) > > I would change to something like: > > struct rte_mempool_cache *cache; > void **cache_objs; > > cache = rte_mempool_default_cache(mp, rte_lcore_id()); if (unlikely(cache > == NULL)) > goto fallback; > > /* Try zero-copy put. */ > cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n); if > (unlikely(cache_objs == NULL)) > goto fallback; > > /* Zero-copy put. */ > /* no need to reset txep[i].mbuf in vector path */ for (i = 0; i < n; i++) > cache_objs[i] = txep[i].mbuf; > goto done; > > fallback: > /* Ordinary put. */ > /* no need to reset txep[i].mbuf in vector path */ This note should be deleted, due to here we need to reset txep[i].mbuf. for (i = 0; i < n ; i++) > free[i] = txep[i].mbuf; > rte_mempool_generic_put(mp, free, n, cache); goto done; > > Agree with this change, some minor comments for the notes in 'fallback'. > > > > > > > > > + for (i = 0; i < n ; i++) > > > > + free[i] = txep[i].mbuf; > > > > + if (!cache) { > > > > + rte_mempool_generic_put(mp, (void > > > **)free, n, > > > > cache); > > > > + goto done; > > > > + } > > > > + if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) { > > > > + rte_mempool_ops_enqueue_bulk(mp, (void > > > **)free, > > > > n); > > > > + goto done; > > > > + } > > > > + } > > > > + void **cache_objs; > > > > + > > > > + cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, > > > n); > > > > + if (cache_objs) { > > > > + for (i = 0; i < n; i++) { > > > > + cache_objs[i] = txep->mbuf; > > > > + /* no need to reset txep[i].mbuf in > > > > vector > > > path > > > > */ > > > > + txep++; > > > > + } > > > > } > > > > - rte_mempool_put_bulk(free[0]->pool, (void **)free, n); > > > > goto done; > > > > } > > > > > > > > -- > > > > 2.25.1 > > > >