> -----邮件原件-----
> 发件人: Morten Brørup <m...@smartsharesystems.com>
> 发送时间: Thursday, February 9, 2023 7:31 PM
> 收件人: Feifei Wang <feifei.wa...@arm.com>; Kamalakshitha Aligeri
> <kamalakshitha.alig...@arm.com>; yuying.zh...@intel.com;
> beilei.x...@intel.com; olivier.m...@6wind.com;
> andrew.rybche...@oktetlabs.ru; bruce.richard...@intel.com;
> konstantin.anan...@huawei.com; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>
> 抄送: dev@dpdk.org; nd <n...@arm.com>; Ruifeng Wang
> <ruifeng.w...@arm.com>; nd <n...@arm.com>
> 主题: RE: [PATCH 1/2] net/i40e: replace put function
> 
> > From: Feifei Wang [mailto:feifei.wa...@arm.com]
> > Sent: Thursday, 9 February 2023 11.59
> >
> > Hi, Morten
> >
> > > 发件人: Morten Brørup <m...@smartsharesystems.com>
> > > 发送时间: Thursday, February 9, 2023 5:34 PM
> > >
> > > > From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com]
> > > > Sent: Thursday, 9 February 2023 07.25
> > > >
> > > > Integrated zero-copy put API in mempool cache in i40e PMD.
> > > > On Ampere Altra server, l3fwd single core's performance improves
> > > > by
> > 5%
> > > > with the new API
> > > >
> > > > Signed-off-by: Kamalakshitha Aligeri
> > <kamalakshitha.alig...@arm.com>
> > > > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> > > > Reviewed-by: Feifei Wang <feifei.wa...@arm.com>
> > > > ---
> > > > Link:
> > > >
> https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887
> > > > -
> > 1-
> > > > m...@smartsharesystems.com/
> > > >
> > > >  .mailmap                                |  1 +
> > > >  drivers/net/i40e/i40e_rxtx_vec_common.h | 34
> > > > ++++++++++++++++++++-----
> > > >  2 files changed, 28 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/.mailmap b/.mailmap
> > > > index 75884b6fe2..05a42edbcf 100644
> > > > --- a/.mailmap
> > > > +++ b/.mailmap
> > > > @@ -670,6 +670,7 @@ Kai Ji <kai...@intel.com>  Kaiwen Deng
> > > > <kaiwenx.d...@intel.com>  Kalesh AP
> > > > <kalesh-anakkur.pura...@broadcom.com>
> > > >  Kamalakannan R <kamalakanna...@intel.com>
> > > > +Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com>
> > > >  Kamil Bednarczyk <kamil.bednarc...@intel.com>  Kamil Chalupnik
> > > > <kamilx.chalup...@intel.com>  Kamil Rytarowski
> > > > <kamil.rytarow...@caviumnetworks.com>
> > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > index fe1a6ec75e..80d4a159e6 100644
> > > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> > > > @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> > > >
> > > >         n = txq->tx_rs_thresh;
> > > >
> > > > -        /* first buffer to free from S/W ring is at index
> > > > -         * tx_next_dd - (tx_rs_thresh-1)
> > > > -         */
> > > > +       /* first buffer to free from S/W ring is at index
> > > > +        * tx_next_dd - (tx_rs_thresh-1)
> > > > +        */
> > > >         txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> > > >
> > > >         if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> > > > -               for (i = 0; i < n; i++) {
> > > > -                       free[i] = txep[i].mbuf;
> > > > -                       /* no need to reset txep[i].mbuf in vector path 
> > > > */
> > > > +               struct rte_mempool *mp = txep[0].mbuf->pool;
> > > > +               struct rte_mempool_cache *cache =
> > > > rte_mempool_default_cache(mp, rte_lcore_id());
> > > > +
> > > > +               if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > >
> > > If the mempool has a cache, do not compare n to
> > > RTE_MEMPOOL_CACHE_MAX_SIZE. Instead, call
> > > rte_mempool_cache_zc_put_bulk() to determine if n is acceptable for
> > zero-
> > > copy.
> > >
> >
> > > It looks like this patch behaves incorrectly if the cache is
> > configured to be
> > > smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size
> is
> > 8,
> > > which will make the flush threshold 12. If n is 32, your code will
> > not enter this
> > > branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which
> > will
> > > return NULL, and then you will goto done.
> > >
> > > Obviously, if there is no cache, fall back to the standard
> > > rte_mempool_put_bulk().
> >
> > Agree with this. I think we ignore the case that (cache -> flushthresh
> > < n <  RTE_MEMPOOL_CACHE_MAX_SIZE).
> >
> > Our goal is that if (!cache || n > cache -> flushthresh), we can put
> > the buffers into mempool directly.
> >
> > Thus maybe we can change as:
> > struct rte_mempool_cache *cache = rte_mempool_default_cache(mp,
> > rte_lcore_id()); if (!cache || n > cache -> flushthresh) {
> >       for (i = 0; i < n ; i++)
> >           free[i] = txep[i].mbuf;
> >       if (!cache) {
> >                 rte_mempool_generic_put;
> >                 goto done;
> >       } else if {
> >                 rte_mempool_ops_enqueue_bulk;
> >                 goto done;
> >       }
> > }
> >
> > If we can change like this?
> 
> Since I consider "flushthreshold" private to the cache structure, it shouldn't
> be accessed directly. If its meaning changes in the future, you will have to
> rewrite the PMD code again. 
Ok, Agree with it. Using private variable to the cache needs to be treated with
caution.

Use the mempool API instead of accessing the
> mempool structures directly. (Yeah, I know the mempool and mempool
> cache structures are not marked as internal, and thus formally public, but I
> still dislike accessing their internals from outside the mempool library.)
> 
> I would change to something like:
> 
> struct rte_mempool_cache *cache;
> void **cache_objs;
> 
> cache = rte_mempool_default_cache(mp, rte_lcore_id()); if (unlikely(cache
> == NULL))
>       goto fallback;
> 
> /* Try zero-copy put. */
> cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n); if
> (unlikely(cache_objs == NULL))
>       goto fallback;
> 
> /* Zero-copy put. */
> /* no need to reset txep[i].mbuf in vector path */ for (i = 0; i < n; i++)
>       cache_objs[i] = txep[i].mbuf;
> goto done;
> 
> fallback:
> /* Ordinary put. */
> /* no need to reset txep[i].mbuf in vector path */ 
This note should be deleted, due to here we need to reset txep[i].mbuf.

for (i = 0; i < n ; i++)
>       free[i] = txep[i].mbuf;
> rte_mempool_generic_put(mp, free, n, cache); goto done;
> 
> 
Agree with this change, some minor comments for the notes in 'fallback'.

> >
> > >
> > > > +                       for (i = 0; i < n ; i++)
> > > > +                               free[i] = txep[i].mbuf;
> > > > +                       if (!cache) {
> > > > +                               rte_mempool_generic_put(mp, (void
> > > **)free, n,
> > > > cache);
> > > > +                               goto done;
> > > > +                       }
> > > > +                       if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> > > > +                               rte_mempool_ops_enqueue_bulk(mp, (void
> > > **)free,
> > > > n);
> > > > +                               goto done;
> > > > +                       }
> > > > +               }
> > > > +               void **cache_objs;
> > > > +
> > > > +               cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp,
> > > n);
> > > > +               if (cache_objs) {
> > > > +                       for (i = 0; i < n; i++) {
> > > > +                               cache_objs[i] = txep->mbuf;
> > > > +                               /* no need to reset txep[i].mbuf in 
> > > > vector
> > > path
> > > > */
> > > > +                               txep++;
> > > > +                       }
> > > >                 }
> > > > -               rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
> > > >                 goto done;
> > > >         }
> > > >
> > > > --
> > > > 2.25.1
> > > >

Reply via email to