On Sat, Oct 12, 2013 at 03:29:24PM -0700, H. Peter Anvin wrote:
> On 10/11/2013 09:51 AM, Neil Horman wrote:
> > Sébastien Dugué reported to me that devices implementing ipoib (which don't 
> > have
> > checksum offload hardware were spending a significant amount of time 
> > computing
> > checksums.  We found that by splitting the checksum computation into two
> > separate streams, each skipping successive elements of the buffer being 
> > summed,
> > we could parallelize the checksum operation accros multiple alus.  Since 
> > neither
> > chain is dependent on the result of the other, we get a speedup in 
> > execution (on
> > hardware that has multiple alu's available, which is almost ubiquitous on 
> > x86),
> > and only a negligible decrease on hardware that has only a single alu (an 
> > extra
> > addition is introduced).  Since addition in commutative, the result is the 
> > same,
> > only faster
> 
> On hardware that implement ADCX/ADOX then you should also be able to
> have additional streams interleaved since those instructions allow for
> dual carry chains.
> 
>       -hpa
> 
I've been looking into this a bit more, and I'm a bit confused.  According to
this:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/ia-large-integer-arithmetic-paper.html

by my read, this pair of instructions simply supports 2 carry bit chains,
allowing for two parallel execution paths through the cpu that won't block on
one another.  Its exactly the same as whats being done with the universally
available addcq instruction, so theres no real speedup (that I can see).  Since
we'd either have to use the alternatives macro to support adcx/adox here or the
old instruction set, it seems not overly worth the effort to support the
extension.  

Or am I missing something?

Neil
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to