Tx mbuf free is a hotspot for i40e on aarch64, as there are no inter-loop dependencies, it is safe to enable auto-vectorization to speed up.
This patch showed 2~3% performance lift on ThunderX2 and no degradation on Arm N1SDP. The test case is single core RFC2544 zero-loss test. Signed-off-by: Gavin Hu <gavin...@arm.com> Reviewed-by: Steve Capper <steve.cap...@arm.com> --- drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 0e6ffa007..fc0fa45d4 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) if (likely(m != NULL)) { free[0] = m; nb_free = 1; +#if defined(__clang__) +#pragma clang loop vectorize(assume_safety) +#elif defined(__GNUC__) +#pragma GCC ivdep +#endif for (i = 1; i < n; i++) { m = rte_pktmbuf_prefree_seg(txep[i].mbuf); if (likely(m != NULL)) { -- 2.17.1