On Sat, Oct 12, 2013 at 03:29:24PM -0700, H. Peter Anvin wrote: > On 10/11/2013 09:51 AM, Neil Horman wrote: > > Sébastien Dugué reported to me that devices implementing ipoib (which don't > > have > > checksum offload hardware were spending a significant amount of time > > computing > > checksums. We found that by splitting the checksum computation into two > > separate streams, each skipping successive elements of the buffer being > > summed, > > we could parallelize the checksum operation accros multiple alus. Since > > neither > > chain is dependent on the result of the other, we get a speedup in > > execution (on > > hardware that has multiple alu's available, which is almost ubiquitous on > > x86), > > and only a negligible decrease on hardware that has only a single alu (an > > extra > > addition is introduced). Since addition in commutative, the result is the > > same, > > only faster > > On hardware that implement ADCX/ADOX then you should also be able to > have additional streams interleaved since those instructions allow for > dual carry chains. > Ok, thats a good idea, I'll look into those instructions this week Neil
> -hpa > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/