Rx descriptor is 16B/32B in size and consists of multiple words.
The word that includes DD field should be read first. Read result
with DD bit set indicates the rest part in a descriptor is valid.

In NEON vector PMD, vector load loads two contiguous 8B of
descriptor data into vector register. Given vector load ensures no
16B atomicity, read of the word that includes DD field could be
reordered after read of other words. In this case, some words could
be invalid data.

Read barrier is added after read of qword1 that includes DD field.
And qword0 is reloaded to update vector register. This ensures
what fetched is correct descriptor data.

Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
Cc: sta...@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
---
 drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c 
b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index b2683fda60..71191c7cc8 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict 
rxq,
                descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
                descs[0] =  vld1q_u64((uint64_t *)(rxdp));
 
+               /* Use acquire fence to order loads of descriptor qwords */
+               rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+               /* A.2 reload qword0 to make it ordered after qword1 load */
+               descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0);
+               descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0);
+               descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0);
+               descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
+
                /* B.1 load 4 mbuf point */
                mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
                mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
-- 
2.25.1

Reply via email to