Heikki Linnakangas <hlinn...@iki.fi> writes: > But let's go back to why we're considering this. The idea was to > optimize this block: > ... > One trick that we could do is to replace that with a 128-bit atomic > compare-and-swap instruction. Modern 64-bit Intel systems have that, > it's called CMPXCHG16B. Don't know about other architectures. An atomic > fetch-and-add, as envisioned in the comment above, would presumably be > better, but I suspect that a compare-and-swap would be good enough to > move the bottleneck elsewhere again.
+1 for taking a look at that. A bit of experimentation shows that recent gcc and clang can generate that instruction using __sync_bool_compare_and_swap or __sync_val_compare_and_swap on an __int128 value. regards, tom lane