On Sat, 07 May 2011 18:47:54 EDT erik quanstrom <quans...@quanstro.net> wrote: > > Just guessing. May be the new code allows more concurrency? If the > > value is not in the processor cache, will the old code block other > > processors for much longer? The new code forces caching with the first > > read so may be high likelyhood cmpxchg will finish faster. I haven't > > studied x86 cache behavior so this guess could be completely wrong. > > Suggest asking on comp.arch where people like Andy Glew can give you a > > definitive answer. > > according to intel, this is a myth. search for "myth" in this page. > > http://software.intel.com/en-us/articles/implementing-scalable-atomic-locks-f > or-multi-core-intel-em64t-and-ia32-architectures/ > > and this stands to reason, since both techniques revolve around a > LOCK'd instruction, thus invoking the x86 architectural MESI(f) > protocol. > > the difference, and my main point is that the loop in ainc means > that it is not a wait-free algorithm. this is not only sub optimal, > but also could lead to incorrect behavior.
I think a more likely possibility for the change is to have a *copy* of what was incremented. lock incl 0(ax) won't tell you what the value was when it was incremented. But I don't see how the change will lead to an incorrect behavior.