http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48126
--- Comment #6 from Michael K. Edwards <m.k.edwards at gmail dot com> 2011-06-22 19:00:54 UTC --- (In reply to comment #5) > If I understand correctly however most cases wouldn't need it - I think most > cases are use the compare&swap to take some form of lock, and then once you > know you have the lock go and do your accesses - and in that case the ordering > is guaranteed, where as if you couldn't take the lock you wouldn't use the > subsequent access anyway. Yes, that fits my understanding. It's only when you actually use the compare-and-swap as a compare-and-swap that you can get bit. I expect that it is quite hard to hit this in the 32-bit case, but with your 64-bit atomics and a dual-core system it should be a little easier to expose. I have an implementation of Michael-Scott lock-free queues (which rely on applying DCAS to a counter+pointer), in which I currently use the assembly cmpxchg64 equivalent we discussed; I'll adapt it to use the GCC intrinsic and re-test.