On Fri, Jan 12, 2018 at 01:29:17PM +0000, Matan Azrad wrote:
> Hi Gaetan
> 
> From: Gaëtan Rivet, Friday, January 12, 2018 12:29 PM
> > Hi Matan,
> > 
> > The other commits make sense to me so no issue there.
> > I'm just surprised by this one so a quick question.
> > 
> > On Tue, Dec 19, 2017 at 05:14:29PM +0000, Matan Azrad wrote:
> > > Connecting the sub-devices each other by cyclic linked list can help
> > > to iterate over them by Rx burst functions because there is no need to
> > > check the sub-devices ring wraparound.
> > >
> > > Create the aforementioned linked-list and change the Rx burst
> > > functions iteration accordingly.
> > 
> > I'm surprised that a linked-list iteration, with the usual dereferencing, is
> > better than doing some integer arithmetic.
> 
> This memory references are the same as the previous code because in the new 
> code the linked list elements are still in continuous memory, so probably the 
> addresses stay in the cache.
> The removed calculations and wraparound branch probably caused to the 
> performance gain.
> 
> > Maybe the locality of the referenced data helps.
> > 
> Sure.

This means that the sub_device definition is critical for the datapath.
It probably goes beyond a cache-line and could be optimized.

> 
> > Anyway, were you able to count the cycles gained by this change? It might be
> > interesting to do a measure with a CPU-bound bench, such as with a dummy
> > device under the fail-safe (ring or such). MLX devices use a lot of PCI
> > bandwidth, so the bottleneck could be masked in a physical setting.
> > 
> > No comments otherwise, if you are sure that this is a performance gain, the
> > implementation seems ok to me.
> 
> Yes, I checked it and saw the little gain obviously.
> (just run the test with and without this patch and saw the statistics).

Oh I'm sure you checked, I just wanted to make sure you properly
considered the methodology.

Acked-by: Gaetan Rivet <gaetan.ri...@6wind.com>

-- 
Gaëtan Rivet
6WIND

Reply via email to