> -----Original Message----- > From: Yigit, Ferruh > Sent: Wednesday, September 14, 2016 9:25 PM > To: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com>; Zhang, Helin > <helin.zhang at intel.com>; Wu, Jingjing <jingjing.wu at intel.com> > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > instructions for bulk rx > > On 7/14/2016 6:27 PM, Vladyslav Buslov wrote: > > Added prefetch of first packet payload cacheline in i40e_rx_scan_hw_ring > > Added prefetch of second mbuf cacheline in i40e_rx_alloc_bufs > > > > Signed-off-by: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com> > > --- > > drivers/net/i40e/i40e_rxtx.c | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c > > index d3cfb98..e493fb4 100644 > > --- a/drivers/net/i40e/i40e_rxtx.c > > +++ b/drivers/net/i40e/i40e_rxtx.c > > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq) > > /* Translate descriptor info to mbuf parameters */ > > for (j = 0; j < nb_dd; j++) { > > mb = rxep[j].mbuf; > > + rte_prefetch0(RTE_PTR_ADD(mb->buf_addr, > RTE_PKTMBUF_HEADROOM));
Why did prefetch here? I think if application need to deal with packet, it is more suitable to put it in application. > > qword1 = rte_le_to_cpu_64(\ > > rxdp[j].wb.qword1.status_error_len); > > pkt_len = ((qword1 & > I40E_RXD_QW1_LENGTH_PBUF_MASK) >> > > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq) > > > > rxdp = &rxq->rx_ring[alloc_idx]; > > for (i = 0; i < rxq->rx_free_thresh; i++) { > > - if (likely(i < (rxq->rx_free_thresh - 1))) > > + if (likely(i < (rxq->rx_free_thresh - 1))) { > > /* Prefetch next mbuf */ > > - rte_prefetch0(rxep[i + 1].mbuf); > > + rte_prefetch0(&rxep[i + 1].mbuf->cacheline0); > > + rte_prefetch0(&rxep[i + 1].mbuf->cacheline1); > > + } Agree with this change. And when I test it by testpmd with iofwd, no performance increase is observed but minor decrease. Can you share will us when it will benefit the performance in your scenario ? Thanks Jingjing