> -----Original Message----- > From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Sent: Wednesday, September 15, 2021 2:33 AM > To: Ruifeng Wang <ruifeng.w...@arm.com>; dev@dpdk.org > Cc: beilei.x...@intel.com; qi.z.zh...@intel.com; > bruce.richard...@intel.com; jer...@marvell.com; > hemant.agra...@nxp.com; d...@linux.vnet.ibm.com; sta...@dpdk.org; nd > <n...@arm.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Honnappa > Nagarahalli <honnappa.nagaraha...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON > vector path > > <snip> > Similar comments that I have to patch 2/2 > > > > > Rx descriptor is 16B/32B in size and consists of multiple words. > > The word that includes DD field should be read first. Read result with > > DD bit set indicates the rest part in a descriptor is valid. > Suggest rewording as follows: > Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the > rest of > the descriptor words have valid values. Hence, the word containing DD bit > must be read first before reading the rest of the descriptor words. > > > > > In NEON vector PMD, vector load loads two contiguous 8B of descriptor > > data into vector register. Given vector load ensures no 16B atomicity, > > read of the word that includes DD field could be reordered after read > > of other words. In this case, some words could be invalid data. > "some words could contain invalid data" > > > > > Read barrier is added after read of qword1 that includes DD field. > > And qword0 is reloaded to update vector register. This ensures what > > fetched is correct descriptor data. > "This ensures that the fetched data is correct". > > Suggest capturing the performance impact, so it is clearly documented.
Added performance impact to commit message in v2. > > > > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> > With the above comments, > Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Thanks for your review. Comments are addressed in v2. > > --- > > drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c > > b/drivers/net/i40e/i40e_rxtx_vec_neon.c > > index b2683fda60..71191c7cc8 100644 > > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c > > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c > > @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue > > *__rte_restrict rxq, > > descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > > descs[0] = vld1q_u64((uint64_t *)(rxdp)); > > > > + /* Use acquire fence to order loads of descriptor qwords */ > > + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); > > + /* A.2 reload qword0 to make it ordered after qword1 load > */ > > + descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], > > 0); > > + descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], > > 0); > > + descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], > > 0); > > + descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); > > + > > /* B.1 load 4 mbuf point */ > > mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); > > mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); > > -- > > 2.25.1