On 9 February 2017 at 03:53, Ananyev, Konstantin <konstantin.anan...@intel.com> wrote: > > >> -----Original Message----- >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Ananyev, Konstantin >> Sent: Wednesday, February 8, 2017 6:54 PM >> To: Yigit, Ferruh <ferruh.yi...@intel.com>; Jianbo Liu >> <jianbo....@linaro.org>; dev@dpdk.org; Zhang, Helin <helin.zh...@intel.com>; >> jerin.ja...@caviumnetworks.com >> Subject: Re: [dpdk-dev] [PATCH v2 1/2] net/ixgbe: calculate the correct >> number of received packets in bulk alloc function >> >> Hi Ferruh, >> >> > >> > On 2/4/2017 1:26 PM, Ananyev, Konstantin wrote: >> > >> >> > >> To get better performance, Rx bulk alloc recv function will scan 8 descs >> > >> in one time, but the statuses are not consistent on ARM platform because >> > >> the memory allocated for Rx descriptors is cacheable hugepages. >> > >> This patch is to calculate the number of received packets by scan DD bit >> > >> sequentially, and stops when meeting the first packet with DD bit unset. >> > >> >> > >> Signed-off-by: Jianbo Liu <jianbo....@linaro.org> >> > >> --- >> > >> drivers/net/ixgbe/ixgbe_rxtx.c | 16 +++++++++------- >> > >> 1 file changed, 9 insertions(+), 7 deletions(-) >> > >> >> > >> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> b/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> index 36f1c02..613890e 100644 >> > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c >> > >> @@ -1460,17 +1460,19 @@ static inline int __attribute__((always_inline)) >> > >> for (i = 0; i < RTE_PMD_IXGBE_RX_MAX_BURST; >> > >> i += LOOK_AHEAD, rxdp += LOOK_AHEAD, rxep += LOOK_AHEAD) { >> > >> /* Read desc statuses backwards to avoid race >> > >> condition */ >> > >> - for (j = LOOK_AHEAD-1; j >= 0; --j) >> > >> + for (j = 0; j < LOOK_AHEAD; j++) >> > >> s[j] = >> > >> rte_le_to_cpu_32(rxdp[j].wb.upper.status_error); >> > >> >> > >> - for (j = LOOK_AHEAD - 1; j >= 0; --j) >> > >> - pkt_info[j] = >> > >> rte_le_to_cpu_32(rxdp[j].wb.lower. >> > >> - lo_dword.data); >> > >> + rte_smp_rmb(); >> > >> >> > >> /* Compute how many status bits were set */ >> > >> - nb_dd = 0; >> > >> - for (j = 0; j < LOOK_AHEAD; ++j) >> > >> - nb_dd += s[j] & IXGBE_RXDADV_STAT_DD; >> > >> + for (nb_dd = 0; nb_dd < LOOK_AHEAD && >> > >> + (s[nb_dd] & IXGBE_RXDADV_STAT_DD); >> > >> nb_dd++) >> > >> + ; >> > >> + >> > >> + for (j = 0; j < nb_dd; j++) >> > >> + pkt_info[j] = >> > >> rte_le_to_cpu_32(rxdp[j].wb.lower. >> > >> + lo_dword.data); >> > >> >> > >> nb_rx += nb_dd; >> > >> >> > >> -- >> > > >> > > Acked-by: Konstantin Ananyev <konstantin.anan...@intel.com> >> > >> > Hi Konstantin, >> > >> > Is the ack valid for v3 and both patches? >> >> No, I didn't look into the second one in details. >> It is ARM specific, and I left it for people who are more familiar with ARM >> then me :) >> Konstantin > > Actually, I had a quick look after your mail. > > + /* A.1 load 1 pkts desc */ > + descs[0] = vld1q_u64((uint64_t *)(rxdp)); > + rte_smp_rmb(); > > /* B.2 copy 2 mbuf point into rx_pkts */ > vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1); > @@ -271,10 +270,11 @@ > /* B.1 load 1 mbuf point */ > mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); > > - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > - /* B.1 load 2 mbuf point */ > descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > - descs[0] = vld1q_u64((uint64_t *)(rxdp)); > + > + /* A.1 load 2 pkts descs */ > + descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > + descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > Assuming that on all ARM-NEON platforms 16B reads are atomic, > I think there is no need for smp_rmb() after the desc[0] read. > What looks more appropriate to me:
With checking DDs in sequence, it doesn't matter much where the rmb is. But there is a little performance improvement (0.02%) in my testing with your suggestion. So I'll send a new version. Thanks! > > descs[0] = vld1q_u64((uint64_t *)(rxdp)); > descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); > descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); > descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); > > rte_smp_rmb(); > > ... > > But, as I said would be good if some ARM guys have a look here. > Konstantin > > >> >> > >> > Thanks, >> > ferruh >> > >> > > >> > >> 1.8.3.1 >> > > >