On Wed, May 25, 2016 at 05:59:38PM +0530, Jerin Jacob wrote:
> On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote:
> > use ARM NEON intrinsic to implement ixgbe vPMD
> > 
> > Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org>
> > ---
> >  drivers/net/ixgbe/Makefile              |   4 +
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 
> > ++++++++++++++++++++++++++++++++
> >  2 files changed, 565 insertions(+)
> >  create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
> > 
<snip>
> > +   for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> > +                   pos += RTE_IXGBE_DESCS_PER_LOOP,
> > +                   rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
> > +           uint64x2_t descs[RTE_IXGBE_DESCS_PER_LOOP];
> > +           uint8x16_t pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> > +           uint8x16x2_t sterr_tmp1, sterr_tmp2;
> > +           uint64x2_t mbp1, mbp2;
> > +           uint8x16_t staterr;
> > +           uint16x8_t tmp;
> > +           uint32_t stat;
> > +
> > +           /* B.1 load 1 mbuf point */
> > +           mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
> > +
> > +           /* Read desc statuses backwards to avoid race condition */
> > +           /* A.1 load 4 pkts desc */
> > +           descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
> > +           rte_rmb();
> 
> Any specific reason to add rte_rmb() here, If there is no performance
> drop then it makes sense to add before descs[3] uses it.i.e
> at rte_compiler_barrier() place in x86 code.
> 
> > +
> > +           /* B.2 copy 2 mbuf point into rx_pkts  */
> > +           vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
> > +
> > +           /* B.1 load 1 mbuf point */
> > +           mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> > +
> > +           descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
> > +           /* B.1 load 2 mbuf point */
> > +           descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
> > +           descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> > +
> > +           /* B.2 copy 2 mbuf point into rx_pkts  */
> > +           vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);
> > +
> > +           if (split_packet) {
> > +                   rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1);
> > +                   rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1);
> > +                   rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1);
> > +                   rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1);
> 
> replace with rte_mbuf_prefetch_part2 or equivalent
> 
Hi Jerin, Jianbo,

since this patch has already been applied and these are not critical issues with
it, can a new patch please be submitted to propose these additional changes on
top of what's on next-net now.

Thanks,
/Bruce

Reply via email to