> -----Original Message-----
> From: Richardson, Bruce <bruce.richard...@intel.com>
> Sent: Wednesday, September 29, 2021 12:08 PM
> To: Ananyev, Konstantin <konstantin.anan...@intel.com>
> Cc: Xueming(Steven) Li <xuemi...@nvidia.com>; jerinjac...@gmail.com;
> NBU-Contact-Thomas Monjalon <tho...@monjalon.net>;
> andrew.rybche...@oktetlabs.ru; dev@dpdk.org; Yigit, Ferruh
> <ferruh.yi...@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v1] ethdev: introduce shared Rx queue
>
> On Wed, Sep 29, 2021 at 09:52:20AM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Xueming(Steven) Li <xuemi...@nvidia.com>
> > > Sent: Wednesday, September 29, 2021 10:13 AM
> <snip>
> > > > + /* Locate real source fs according to mbuf->port. */
> > > > + for (i = 0; i < nb_rx; ++i) {
> > > > + rte_prefetch0(pkts_burst[i + 1]);
> > > >
> > > > you access pkt_burst[] beyond array boundaries,
> > > > also you ask cpu to prefetch some unknown and possibly invalid
> > > > address.
> > >
> > > Sorry I forgot this topic. It's too late to prefetch current packet, so
> > > perfetch next is better. Prefetch an invalid address at end of a look
> > > doesn't hurt, it's common in DPDK.
> >
> > First of all it is usually never 'OK' to access array beyond its bounds.
> > Second prefetching invalid address *does* hurt performance badly on many
> > CPUs
> > (TLB misses, consumed memory bandwidth etc.).
> > As a reference: https://lwn.net/Articles/444346/
> > If some existing DPDK code really does that - then I believe it is an issue
> > and has to be addressed.
> > More important - it is really bad attitude to submit bogus code to DPDK
> > community
> > and pretend that it is 'OK'.
> >
>
> The main point we need to take from all this is that when
> prefetching you need to measure perf impact of it.
>
> In terms of the specific case of prefetching one past the end of the array,
> I would take the view that this is harmless in almost all cases. Unlike any
> prefetch of "NULL" as in the referenced mail, reading one past the end (or
> other small number of elements past the end) is far less likely to cause a
> TLB miss - and it's basically just reproducing behaviour we would expect
> off a HW prefetcher (though those my explicitly never cross page
> boundaries). However, if you feel it's just cleaner to put in an
> additional condition to remove the prefetch for the end case, that's ok
> also - again so long as it doesn't affect performance. [Since prefetch is a
> hint, I'm not sure if compilers or CPUs may be legally allowed to skip the
> branch and blindly prefetch in all cases?]
Please look at the code.
It doesn't prefetch next element beyond array boundaries.
It first reads address from the element that is beyond array boundaries (which
is a bug by itself).
Then it prefetches that bogus address.
We simply don't know is this address is valid and where it points to.
In other words, it doesn't do:
rte_prefetch0(&pkts_burst[i + 1]);
It does:
rte_prefetch0(pkts_burst[i + 1]);