To keep ordering of mixed accesses, rte_cio is sufficient.
The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]

This patch fixes by replacing with just sufficient barriers in the
normal PMD and vPMD.

It showed 7% performance uplift on ThunderX2 and 4% on Arm N1SDP.
The test case is the RFC2544 zero-loss test running testpmd.

[1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9
qf0kpn89emdgdajepk...@mail.gmail.com

Fixes: 4861cde46116 ("i40e: new poll mode driver")
Cc: sta...@dpdk.org

Signed-off-by: Gavin Hu <gavin...@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index deb185fe2..4376d8911 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -72,8 +72,9 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq)
        rx_id = (uint16_t)((rxq->rxrearm_start == 0) ?
                             (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1));
 
+       rte_cio_wmb();
        /* Update the tail pointer on the NIC */
-       I40E_PCI_REG_WRITE(rxq->qrx_tail, rx_id);
+       I40E_PCI_REG_WRITE_RELAXED(rxq->qrx_tail, rx_id);
 }
 
 static inline void
@@ -564,7 +565,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf 
**tx_pkts,
 
        txq->tx_tail = tx_id;
 
-       I40E_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail);
+       rte_cio_wmb();
+       I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id);
 
        return nb_pkts;
 }
-- 
2.17.1

Reply via email to