> -----Original Message-----
> From: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> Sent: Wednesday, September 15, 2021 2:33 AM
> To: Ruifeng Wang <ruifeng.w...@arm.com>; dev@dpdk.org
> Cc: beilei.x...@intel.com; qi.z.zh...@intel.com;
> bruce.richard...@intel.com; jer...@marvell.com;
> hemant.agra...@nxp.com; d...@linux.vnet.ibm.com; sta...@dpdk.org; nd
> <n...@arm.com>; Ruifeng Wang <ruifeng.w...@arm.com>; Honnappa
> Nagarahalli <honnappa.nagaraha...@arm.com>; nd <n...@arm.com>
> Subject: RE: [PATCH 1/2] net/i40e: fix risk in Rx descriptor read in NEON
> vector path
> 
> <snip>
> Similar comments that I have to patch 2/2
> 
> >
> > Rx descriptor is 16B/32B in size and consists of multiple words.
> > The word that includes DD field should be read first. Read result with
> > DD bit set indicates the rest part in a descriptor is valid.
> Suggest rewording as follows:
> Rx descriptor is 16B/32B in size. If the DD bit is set, it indicates that the 
> rest of
> the descriptor words have valid values. Hence, the word containing DD bit
> must be read first before reading the rest of the descriptor words.
> 
> >
> > In NEON vector PMD, vector load loads two contiguous 8B of descriptor
> > data into vector register. Given vector load ensures no 16B atomicity,
> > read of the word that includes DD field could be reordered after read
> > of other words. In this case, some words could be invalid data.
> "some words could contain invalid data"
> 
> >
> > Read barrier is added after read of qword1 that includes DD field.
> > And qword0 is reloaded to update vector register. This ensures what
> > fetched is correct descriptor data.
> "This ensures that the fetched data is correct".
> 
> Suggest capturing the performance impact, so it is clearly documented.

Added performance impact to commit message in v2.
> >
> > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM")
> > Cc: sta...@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com>
> With the above comments,
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>
> 

Thanks for your review.
Comments are addressed in v2.
> > ---
> >  drivers/net/i40e/i40e_rxtx_vec_neon.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > index b2683fda60..71191c7cc8 100644
> > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > @@ -286,6 +286,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue
> > *__rte_restrict rxq,
> >             descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
> >             descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> >
> > +           /* Use acquire fence to order loads of descriptor qwords */
> > +           rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
> > +           /* A.2 reload qword0 to make it ordered after qword1 load
> */
> > +           descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3],
> > 0);
> > +           descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2],
> > 0);
> > +           descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1],
> > 0);
> > +           descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0);
> > +
> >             /* B.1 load 4 mbuf point */
> >             mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
> >             mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> > --
> > 2.25.1

Reply via email to