To keep ordering of mixed accesses, rte_cio is sufficient. The rte_io barrier inside the I40E_PCI_REG_WRITE is overkill.[1]
This patch fixes by replacing with just sufficient barriers in the normal PMD and vPMD. It showed 7% performance uplift on ThunderX2 and 4% on Arm N1SDP. The test case is the RFC2544 zero-loss test running testpmd. [1] http://inbox.dpdk.org/dev/CALBAE1M-ezVWCjqCZDBw+MMDEC4O9 qf0kpn89emdgdajepk...@mail.gmail.com Fixes: 4861cde46116 ("i40e: new poll mode driver") Cc: sta...@dpdk.org Signed-off-by: Gavin Hu <gavin...@arm.com> --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index deb185fe2..4376d8911 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -72,8 +72,9 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rx_id = (uint16_t)((rxq->rxrearm_start == 0) ? (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); + rte_cio_wmb(); /* Update the tail pointer on the NIC */ - I40E_PCI_REG_WRITE(rxq->qrx_tail, rx_id); + I40E_PCI_REG_WRITE_RELAXED(rxq->qrx_tail, rx_id); } static inline void @@ -564,7 +565,8 @@ i40e_xmit_fixed_burst_vec(void *tx_queue, struct rte_mbuf **tx_pkts, txq->tx_tail = tx_id; - I40E_PCI_REG_WRITE(txq->qtx_tail, txq->tx_tail); + rte_cio_wmb(); + I40E_PCI_REG_WRITE_RELAXED(txq->qtx_tail, tx_id); return nb_pkts; } -- 2.17.1