Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

Neil Horman Sun, 13 Oct 2013 05:54:14 -0700

On Sat, Oct 12, 2013 at 07:21:24PM +0200, Ingo Molnar wrote:
> 
> * Neil Horman <[email protected]> wrote:
> 
> > Sébastien Dugué reported to me that devices implementing ipoib (which 
> > don't have checksum offload hardware were spending a significant amount 
> > of time computing checksums.  We found that by splitting the checksum 
> > computation into two separate streams, each skipping successive elements 
> > of the buffer being summed, we could parallelize the checksum operation 
> > accros multiple alus.  Since neither chain is dependent on the result of 
> > the other, we get a speedup in execution (on hardware that has multiple 
> > alu's available, which is almost ubiquitous on x86), and only a 
> > negligible decrease on hardware that has only a single alu (an extra 
> > addition is introduced).  Since addition in commutative, the result is 
> > the same, only faster
> 
> This patch should really come with measurement numbers: what performance 
> increase (and drop) did you get on what CPUs.
> 
> Thanks,
> 
Sure, I can gather some stats for you.  I'll post them later this week
Neil


>       Ingo
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: Run checksumming in parallel accross multiple alu's

Reply via email to