On 10/11/2013 09:51 AM, Neil Horman wrote: > Sébastien Dugué reported to me that devices implementing ipoib (which don't > have > checksum offload hardware were spending a significant amount of time computing > checksums. We found that by splitting the checksum computation into two > separate streams, each skipping successive elements of the buffer being > summed, > we could parallelize the checksum operation accros multiple alus. Since > neither > chain is dependent on the result of the other, we get a speedup in execution > (on > hardware that has multiple alu's available, which is almost ubiquitous on > x86), > and only a negligible decrease on hardware that has only a single alu (an > extra > addition is introduced). Since addition in commutative, the result is the > same, > only faster
On hardware that implement ADCX/ADOX then you should also be able to have additional streams interleaved since those instructions allow for dual carry chains. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/