Jeff Janes <jeff.ja...@gmail.com> writes: > What scale factor and client count? How many cores per socket? It looks > like Sokolov was just starting to see gains at 200 clients on 72 cores, > using -N transaction.
I spent some time poking at this with the test scaffolding I showed at https://www.postgresql.org/message-id/17694.1504665058%40sss.pgh.pa.us AFAICT, there are not huge differences between different coding methods even for two processes in direct competition in the tightest possible atomic-ops loops. So if you're testing SQL operations, you're definitely not going to see anything without a whole lot of concurrent processes. Moreover, it matters which primitive you're testing, on which platform, with which compiler, because we have a couple of layers of atomic ops implementations. I started out testing pg_atomic_fetch_add_u32(), as shown in the above- mentioned message. On x86_x64 with gcc, that is implemented by the code for it in src/include/port/atomics/arch-x86.h. If you dike out that support, it falls back to the version in atomics/generic-gcc.h, which seems no worse and possibly better. Only if you also dike out generic-gcc.h do you get to the version in atomics/generic.h that this patch wants to change. However, at that point you're also down to a spinlock-based implementation of the underlying pg_atomic_read_u32_impl and pg_atomic_compare_exchange_u32_impl, which means that performance is going to be less than great anyway. No platform that we consider well-supported ought to be hitting the spinlock paths. This means that Sokolov's proposed changes in atomics/generic.h ought to be uninteresting for performance on any platform we care about --- at least for pg_atomic_fetch_add_u32(). However, Sokolov also proposes adding gcc-intrinsic support for pg_atomic_fetch_and_xxx and pg_atomic_fetch_or_xxx. This is a different kettle of fish. I repeated the experiments I'd done for pg_atomic_fetch_add_u32(), per the above message, using the "or" primitive, and found largely the same results as for "add": it's slightly better under contention than the generic code, and significantly better if the result value of the atomic op isn't needed. So I think we should definitely take the part of the patch that changes generic-gcc.h --- and we should check to see if we're missing out on any other gcc primitives we should be using there. I'm less excited about the proposed changes in generic.h. There's nothing wrong with them in principle, but they mostly shouldn't make any performance difference on any platform we care about, because we shouldn't get to that code in the first place on any platform we care about. If we are getting to that code for any specific primitive, then either there's a gap in the platform-specific or compiler-specific support, or it's debatable that we ought to consider that primitive to be very primitive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers