> And there can't be any livelock, since by definition somebody else > _did_ make progress. In fact, adding the cpu_relax() probably just > makes things much less fair - once somebody else raced on you, the > cpu_relax() now makes it more likely that _another_ cpu does so too. > > That said, let's see Tony's numbers are.
Data from 20 runs of "./t" 3.11 + Linus enabling patches, but ia64 not enabled (commit bc08b449ee14a from Linus tree). mean 3469553.800000 min 3367709.000000 max 3494154.000000 stddev = 43613.722742 Now add ia64 enabling (including the cpu_relax()) mean 5509067.150000 // nice boost min 3191639.000000 // worst case is worse than worst case before we made the change max 6508629.000000 stddev = 793243.943875 // much more variation from run to run Comment out the cpu_relax() mean 2185864.400000 // this sucks min 2141242.000000 max 2286505.000000 stddev = 40847.960152 // but it consistently sucks So Linus is right that the cpu_relax() makes things less fair ... but without it performance sucks so much that I don't want to use the clever cmpxchg at all - I'm much better off without it! This may be caused by Itanium hyper-threading (SOEMT - switch on event multi-threading) where the spinning thread means that its buddy retires no instructions until h/w times it out and forces a switch. But that's just a guess - losing the cacheline to whoever made the change that caused the cmpxchg to fail should also force a thread switch. -Tony