> > As packet length extraction code was simplified,the ordering was not > necessary any more.[1] IMO, there is no relationship between the compiler barrier and [1] at least on Arm platforms. I suggest we just say 'there is no reason for the compiler barrier'. I think this compiler barrier is not required for x86/PPC as well.
> > 2% performance gain was measured on Marvell ThunderX2. > 4.3% performance gain was measure on Ampere eMAG80 > > [1] http://mails.dpdk.org/archives/dev/2016-April/037529.html > > Fixes: ae0eb310f253 ("net/i40e: implement vector PMD for ARM") > Cc: sta...@dpdk.org > > Signed-off-by: Gavin Hu <gavin...@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.w...@arm.com> > Reviewed-by: Steve Capper <steve.cap...@arm.com> > --- > drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c > b/drivers/net/i40e/i40e_rxtx_vec_neon.c > index 5555e9b..864eb9a 100644 > --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c > +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c > @@ -307,9 +307,6 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, > struct rte_mbuf **rx_pkts, > rte_mbuf_prefetch_part2(rx_pkts[pos + 3]); > } > > - /* avoid compiler reorder optimization */ > - rte_compiler_barrier(); > - > /* pkt 3,4 shift the pktlen field to be 16-bit aligned*/ > uint32x4_t len3 = > vshlq_u32(vreinterpretq_u32_u64(descs[3]), > len_shl); > -- > 2.7.4