On machines with lots of CPU cores, pgbench can start eating up a lot of system time. Investigation reveals that the problem is with random(), which glibc implements like this:
long int __random () { int32_t retval; __libc_lock_lock (lock); (void) __random_r (&unsafe_state, &retval); __libc_lock_unlock (lock); return retval; } weak_alias (__random, random) Rather obviously, if you're running enough pgbench threads, you're going to have a pretty ugly point of contention there. On the 32-core machine provided by Nate Boley, with my usual 5-minute SELECT-only test, lazy-vxid and sinval-fastmessages applied, and scale factor 100, "time" shows that pgbench uses almost as much system time as user time: $ time pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 55319555 tps = 184396.016257 (including connections establishing) tps = 184410.926840 (excluding connections establishing) real 5m0.019s user 21m10.100s sys 17m45.480s I patched it to use random_r() - the patch is attached - and here are the (rather gratifying) results of that test: $ time ./pgbench -n -S -T 300 -c 64 -j 64 transaction type: SELECT only scaling factor: 100 query mode: simple number of clients: 64 number of threads: 64 duration: 300 s number of transactions actually processed: 71851589 tps = 239503.585813 (including connections establishing) tps = 239521.816698 (excluding connections establishing) real 5m0.016s user 20m40.880s sys 9m25.930s Since a client-limited benchmark isn't very interesting, I think this change makes sense. Thoughts? Objections? Coding style improvements? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
random_r.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers