> From: Kamalakshitha Aligeri [mailto:kamalakshitha.alig...@arm.com]
> Sent: Thursday, 9 February 2023 07.25
> 
> Integrated zero-copy put API in mempool cache in i40e PMD.
> On Ampere Altra server, l3fwd single core's performance improves by 5%
> with the new API
> 
> Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com>
> Reviewed-by: Feifei Wang <feifei.wa...@arm.com>
> ---
> Link:
> https://patchwork.dpdk.org/project/dpdk/patch/20221227151700.80887-1-
> m...@smartsharesystems.com/
> 
>  .mailmap                                |  1 +
>  drivers/net/i40e/i40e_rxtx_vec_common.h | 34 ++++++++++++++++++++-----
>  2 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/.mailmap b/.mailmap
> index 75884b6fe2..05a42edbcf 100644
> --- a/.mailmap
> +++ b/.mailmap
> @@ -670,6 +670,7 @@ Kai Ji <kai...@intel.com>
>  Kaiwen Deng <kaiwenx.d...@intel.com>
>  Kalesh AP <kalesh-anakkur.pura...@broadcom.com>
>  Kamalakannan R <kamalakanna...@intel.com>
> +Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com>
>  Kamil Bednarczyk <kamil.bednarc...@intel.com>
>  Kamil Chalupnik <kamilx.chalup...@intel.com>
>  Kamil Rytarowski <kamil.rytarow...@caviumnetworks.com>
> diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h
> b/drivers/net/i40e/i40e_rxtx_vec_common.h
> index fe1a6ec75e..80d4a159e6 100644
> --- a/drivers/net/i40e/i40e_rxtx_vec_common.h
> +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
> @@ -95,17 +95,37 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
> 
>       n = txq->tx_rs_thresh;
> 
> -      /* first buffer to free from S/W ring is at index
> -       * tx_next_dd - (tx_rs_thresh-1)
> -       */
> +     /* first buffer to free from S/W ring is at index
> +      * tx_next_dd - (tx_rs_thresh-1)
> +      */
>       txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)];
> 
>       if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) {
> -             for (i = 0; i < n; i++) {
> -                     free[i] = txep[i].mbuf;
> -                     /* no need to reset txep[i].mbuf in vector path */
> +             struct rte_mempool *mp = txep[0].mbuf->pool;
> +             struct rte_mempool_cache *cache =
> rte_mempool_default_cache(mp, rte_lcore_id());
> +
> +             if (!cache || n > RTE_MEMPOOL_CACHE_MAX_SIZE) {

If the mempool has a cache, do not compare n to RTE_MEMPOOL_CACHE_MAX_SIZE. 
Instead, call rte_mempool_cache_zc_put_bulk() to determine if n is acceptable 
for zero-copy.

It looks like this patch behaves incorrectly if the cache is configured to be 
smaller than RTE_MEMPOOL_CACHE_MAX_SIZE. Let's say the cache size is 8, which 
will make the flush threshold 12. If n is 32, your code will not enter this 
branch, but proceed to call rte_mempool_cache_zc_put_bulk(), which will return 
NULL, and then you will goto done.

Obviously, if there is no cache, fall back to the standard 
rte_mempool_put_bulk().

> +                     for (i = 0; i < n ; i++)
> +                             free[i] = txep[i].mbuf;
> +                     if (!cache) {
> +                             rte_mempool_generic_put(mp, (void **)free, n,
> cache);
> +                             goto done;
> +                     }
> +                     if (n > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> +                             rte_mempool_ops_enqueue_bulk(mp, (void **)free,
> n);
> +                             goto done;
> +                     }
> +             }
> +             void **cache_objs;
> +
> +             cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n);
> +             if (cache_objs) {
> +                     for (i = 0; i < n; i++) {
> +                             cache_objs[i] = txep->mbuf;
> +                             /* no need to reset txep[i].mbuf in vector path
> */
> +                             txep++;
> +                     }
>               }
> -             rte_mempool_put_bulk(free[0]->pool, (void **)free, n);
>               goto done;
>       }
> 
> --
> 2.25.1
> 

Reply via email to