On Fri, Mar 30, 2012 at 1:55 PM, Robert Haas <robertmh...@gmail.com> wrote: > Actually, what is really bugging me is that I cannot find any way of > getting a profile that reflects the *time* spent waiting rather than > merely the *number* of waits. That seems like an obvious thing to > want, and I cannot find a way to get it.
I ended up taking a stab at writing some userspace code for this - see attached. It measures the amount of time taken for each contended lwlock acquisition and prints a crude histogram of the results at postmaster-shutdown time. At least on the IBM POWER7 machine, running pgbench with 32 threads, where pg_test_timing shows that getting the time takes less than a microsecond 96%+ of the time, this seemed to have no real impact on the tps numbers - perhaps because the workload is I/O bound. Risking the possible ire of people who object to large attachments, I'm attaching the results this generated on a 30-minute, 32-thread pgbench run at scale factor 300. To minimize said ire, I've run the file through bzip2. What's interesting about this is that, while there is plenty of waiting for the usual suspects - ProcArrayLock (4), WALInsertLock (7), and CLogControlLock (11), the waits are all pretty short: 2012-03-31 02:33:25 UTC [50305] LOG: lock 0: 2:520, 3:2222 2012-03-31 02:33:25 UTC [50305] LOG: lock 3: 1:9, 2:36838 2012-03-31 02:33:25 UTC [50305] LOG: lock 4: 1:33, 2:216964 2012-03-31 02:33:25 UTC [50305] LOG: lock 7: 1:39, 2:406249 2012-03-31 02:33:25 UTC [50305] LOG: lock 8: 1:4, 2:34 2012-03-31 02:33:25 UTC [50305] LOG: lock 11: 1:99, 2:374559 2012-03-31 02:33:25 UTC [50305] LOG: lock 17: 2:24, 3:24 That's saying that there were over 400,000 contended acquisitions of WALInsertLock, but the longest one had fls(wait_time_in_us) = 2, or in other words it took less than 4us to get the lock. So what happens if we grep the log file for the biggest offenders? 2012-03-31 02:33:25 UTC [50305] LOG: lock 204610: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 272958: 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 325412: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 325784: 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 360016: 5:1, 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 444886: 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 499890: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 533418: 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 610484: 6:1, 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 897798: 22:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1027128: 7:1, 20:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1074256: 5:1, 21:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1132586: 5:1, 23:1 2012-03-31 02:33:25 UTC [50305] LOG: lock 1312178: 16:1, 22:1 If I'm reading this right, fls(wait_time_in_us) = 20 means a 1 second delay, which I think means that a couple of those waits were >= *8 seconds*. gdb reveals that in the test configuration, all odd numbered locks between 169 and 2097319 are some buffer's io_in_progress_lock, and all even numbered locks between 170 and 2097320 are some buffer's content_lock, which means, if I'm not confused here, that every single lwlock-related stall > 1s happened while waiting for a buffer content lock. Moreover, each event affected a different buffer. I find this result so surprising that I have a hard time believing that I haven't screwed something up, so if anybody can check over the patch and this analysis and suggest what that thing might be, I would appreciate it. I would be a lot less surprised if the waits revolved around the IO-in-progress locks, since it's not that hard to imagine an I/O taking a really long time on a busy system. But I didn't think we were ever supposed to hold content locks for that long. The other thing that baffles me about these numbers is that they don't provide any fodder for explaining the periodic drops in throughput that happen when the system checkpoints. I had assumed they would show up as long lwlock waits, like somebody hanging on to WALInsertLock while everybody else in the system piles up behind them. That's not reflected in these numbers - the few very long waits show just ONE guy waiting a really long time for the lock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
lwhistogram.patch
Description: Binary data
log.wp.lwhistogram.32.300.1800.bz2
Description: BZip2 compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers