On Wed, May 25, 2016 at 05:59:38PM +0530, Jerin Jacob wrote: > On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote: > > use ARM NEON intrinsic to implement ixgbe vPMD > > > > Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org> > > --- > > drivers/net/ixgbe/Makefile | 4 + > > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 > > ++++++++++++++++++++++++++++++++ > > 2 files changed, 565 insertions(+) > > create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c > > <snip> > > + for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts; > > + pos += RTE_IXGBE_DESCS_PER_LOOP, > > + rxdp += RTE_IXGBE_DESCS_PER_LOOP) { > > + uint64x2_t descs[RTE_IXGBE_DESCS_PER_LOOP]; > > + uint8x16_t pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4; > > + uint8x16x2_t sterr_tmp1, sterr_tmp2; > > + uint64x2_t mbp1, mbp2; > > + uint8x16_t staterr; > > + uint16x8_t tmp; > > + uint32_t stat; > > + > > + /* B.1 load 1 mbuf point */ > > + mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); > > + > > + /* Read desc statuses backwards to avoid race condition */ > > + /* A.1 load 4 pkts desc */ > > + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > + rte_rmb(); > > Any specific reason to add rte_rmb() here, If there is no performance > drop then it makes sense to add before descs[3] uses it.i.e > at rte_compiler_barrier() place in x86 code. > > > + > > + /* B.2 copy 2 mbuf point into rx_pkts */ > > + vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); > > + > > + /* B.1 load 1 mbuf point */ > > + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); > > + > > + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > > + /* B.1 load 2 mbuf point */ > > + descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > > + descs[0] = vld1q_u64((uint64_t *)(rxdp)); > > + > > + /* B.2 copy 2 mbuf point into rx_pkts */ > > + vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); > > + > > + if (split_packet) { > > + rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1); > > + rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1); > > replace with rte_mbuf_prefetch_part2 or equivalent > Hi Jerin, Jianbo,
since this patch has already been applied and these are not critical issues with it, can a new patch please be submitted to propose these additional changes on top of what's on next-net now. Thanks, /Bruce