From: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> Integrated zero-copy put API in mempool cache in i40e PMD. On Ampere Altra server, l3fwd single core's performance improves by 5% with the new API
Signed-off-by: Kamalakshitha Aligeri <kamalakshitha.alig...@arm.com> Signed-off-by: Dharmik Thakkar <dharmikjayesh.thak...@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> Reviewed-by: Feifei Wang <feifei.wa...@arm.com> Acked-by: Morten Brørup <m...@smartsharesystems.com> Acked-by: Konstantin Ananyev <konstantin.v.anan...@yandex.ru> --- drivers/net/i40e/i40e_rxtx_vec_common.h | 29 ++++++++++++++++++++----- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index fe1a6ec75ef5..bb746fc36823 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -85,7 +85,7 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) uint32_t n; uint32_t i; int nb_free = 0; - struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ]; + struct rte_mbuf *m, *free[RTE_I40E_TX_MAX_FREE_BUF_SZ] = {0}; /* check DD bits on threshold descriptor */ if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz & @@ -95,18 +95,35 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) n = txq->tx_rs_thresh; - /* first buffer to free from S/W ring is at index - * tx_next_dd - (tx_rs_thresh-1) - */ + /* first buffer to free from S/W ring is at index + * tx_next_dd - (tx_rs_thresh-1) + */ txep = &txq->sw_ring[txq->tx_next_dd - (n - 1)]; if (txq->offloads & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE) { + struct rte_mempool *mp = txep[0].mbuf->pool; + struct rte_mempool_cache *cache = rte_mempool_default_cache(mp, rte_lcore_id()); + void **cache_objs; + + if (unlikely(!cache)) + goto fallback; + + cache_objs = rte_mempool_cache_zc_put_bulk(cache, mp, n); + if (unlikely(!cache_objs)) + goto fallback; + for (i = 0; i < n; i++) { - free[i] = txep[i].mbuf; + cache_objs[i] = txep[i].mbuf; /* no need to reset txep[i].mbuf in vector path */ } - rte_mempool_put_bulk(free[0]->pool, (void **)free, n); goto done; + +fallback: + for (i = 0; i < n; i++) + free[i] = txep[i].mbuf; + rte_mempool_generic_put(mp, (void **)free, n, cache); + goto done; + } m = rte_pktmbuf_prefree_seg(txep[0].mbuf); -- 2.25.1