Tx mbuf free is a hotspot for i40e on aarch64, as there are no
inter-loop dependencies, it is safe to enable auto-vectorization
to speed up.

This patch showed 2~3% performance lift on ThunderX2 and no degradation
on Arm N1SDP. The test case is single core RFC2544 zero-loss test.

Signed-off-by: Gavin Hu <gavin...@arm.com>
Reviewed-by: Steve Capper <steve.cap...@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h 
b/drivers/net/i40e/i40e_rxtx_vec_common.h
index 0e6ffa007..fc0fa45d4 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_common.h
+++ b/drivers/net/i40e/i40e_rxtx_vec_common.h
@@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
        if (likely(m != NULL)) {
                free[0] = m;
                nb_free = 1;
+#if defined(__clang__)
+#pragma clang loop vectorize(assume_safety)
+#elif defined(__GNUC__)
+#pragma GCC ivdep
+#endif
                for (i = 1; i < n; i++) {
                        m = rte_pktmbuf_prefree_seg(txep[i].mbuf);
                        if (likely(m != NULL)) {
-- 
2.17.1

Reply via email to