On Sat, Oct 12, 2013 at 03:29:24PM -0700, H. Peter Anvin wrote:
> On 10/11/2013 09:51 AM, Neil Horman wrote:
> > Sébastien Dugué reported to me that devices implementing ipoib (which don't 
> > have
> > checksum offload hardware were spending a significant amount of time 
> > computing
> > checksums.  We found that by splitting the checksum computation into two
> > separate streams, each skipping successive elements of the buffer being 
> > summed,
> > we could parallelize the checksum operation accros multiple alus.  Since 
> > neither
> > chain is dependent on the result of the other, we get a speedup in 
> > execution (on
> > hardware that has multiple alu's available, which is almost ubiquitous on 
> > x86),
> > and only a negligible decrease on hardware that has only a single alu (an 
> > extra
> > addition is introduced).  Since addition in commutative, the result is the 
> > same,
> > only faster
> 
> On hardware that implement ADCX/ADOX then you should also be able to
> have additional streams interleaved since those instructions allow for
> dual carry chains.
> 
Ok, thats a good idea, I'll look into those instructions this week
Neil

>       -hpa
> 
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to