On 25 May 2016 at 20:29, Jerin Jacob <jerin.jacob at caviumnetworks.com> wrote: > On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote: >> use ARM NEON intrinsic to implement ixgbe vPMD >> >> Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org> >> --- >> drivers/net/ixgbe/Makefile | 4 + >> drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 >> ++++++++++++++++++++++++++++++++ >> 2 files changed, 565 insertions(+) >> create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
>> + /* Read desc statuses backwards to avoid race condition */ >> + /* A.1 load 4 pkts desc */ >> + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); >> + rte_rmb(); > > Any specific reason to add rte_rmb() here, If there is no performance > drop then it makes sense to add before descs[3] uses it.i.e > at rte_compiler_barrier() place in x86 code. > To avoid desc statuses inconsistent since they are read backwards. >> + >> + /* B.2 copy 2 mbuf point into rx_pkts */ >> + vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); >> + >> + /* B.1 load 1 mbuf point */ >> + mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); >> + >> + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); >> + /* B.1 load 2 mbuf point */ >> + descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); >> + descs[0] = vld1q_u64((uint64_t *)(rxdp)); >> + >> + /* B.2 copy 2 mbuf point into rx_pkts */ >> + vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); >> + >> + if (split_packet) { >> + rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1); >> + rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1); > > replace with rte_mbuf_prefetch_part2 or equivalent > rte_mbuf_prefetch_part2 is new functions after this patchset, so it's better to submit a new patch as Bruce said.