> Just guessing. May be the new code allows more concurrency? If the  
> value is not in the processor cache, will the old code block other  
> processors for much longer? The new code forces caching with the first  
> read so may be high likelyhood cmpxchg will finish faster. I haven't  
> studied x86 cache behavior so this guess could be completely wrong.  
> Suggest asking on comp.arch where people like Andy Glew can give you a  
> definitive answer.

according to intel, this is a myth.  search for "myth" in this page.

http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-for-multi-core-intel-em64t-and-ia32-architectures/

and this stands to reason, since both techniques revolve around a
LOCK'd instruction, thus invoking the x86 architectural MESI(f)
protocol.

the difference, and my main point is that the loop in ainc means
that it is not a wait-free algorithm.  this is not only sub optimal,
but also could lead to incorrect behavior.

- erik

Reply via email to