Re: [HACKERS] Why are we waiting?
On Mon, 2008-02-04 at 19:46, Simon Riggs wrote: > We've got various facilities now for looking at LWLock waits, but I'd > like to have more information about the *reasons* for lock waits. > > I know its possible to get backtraces in Dtrace at various tracepoints > but that can be fairly hard to interpret. Simon, A couple of guys here at Sun have started looking at trying to improve the scaling on Sun's T2000 Niagara servers (single core, 32 HW-threads). We mostly use various DTrace scripts to monitor locking usage. We'd be happy to share results, probes and DTrace scripts for monitoring if you like. So far we've found lock contention (especially for the ProcArrayLock) to be the likely source for our inability to saturate the CPU with a TPC-C like OLTP load with >1000 users. The following shows average time spent in and waiting for exclusive ProcArrayLock vs the time used in the transaction through commit, (i.e., up until the exclusive ProcArrayLock acquire to update the PGPROC setting transaction no longer running): # ./pg-lwlock-procarray.d $(pgrep -n postgres) postgres`LWLockAcquire postgres`ProcArrayEndTransaction+0x10 postgres`CommitTransaction+0xf0 postgres`CommitTransactionCommand+0x90 postgres`finish_xact_command+0x60 postgres`exec_execute_message+0x3b4 postgres`PostgresMain+0x13a0 postgres`BackendRun+0x27c postgres`BackendStartup+0xe0 postgres`ServerLoop+0x1a0 postgres`PostmasterMain+0xcbc postgres`main+0x1d8 postgres`_start+0x108 23 avg lwlock acquire service time [ns] 193051989 transaction-commit [count] 23 transaction-start [count] 23 avg transaction time [ns] 12763079 The stack trace shows that the only time the lock is acquired exclusively is from the call to ProcArrayEndTransaction() in CommitTransaction(). Also, an interesting observation is that the hot locks seem to have changed from v8.2 to v8.3, making the ProcArrayLock more contended. See the following outputs: PostgreSQL 8.2 (32-bit): -bash-3.00# ./825_lwlock_wait.d Lock IdMode Count FirstLockMgrLock Shared 2 ProcArrayLock Shared 209 XidGenLock Exclusive1030 XidGenLock Shared1215 WALInsertLock Exclusive3942 CLogControlLock Shared4113 ProcArrayLock Exclusive6929 WALWriteLock Exclusive 17155 CLogControlLock Exclusive 128182 Lock Id Combined Time (ns) FirstLockMgrLock24705 WALInsertLock 79644210 XidGenLock179886846 ProcArrayLock 1486950738 WALWriteLock 18425400504 CLogControlLock1507388036453 PostgreSQL 8.3 (64-bit): -bash-3.00# ./83_lwlock_wait.d Lock IdMode Count SInvalLock Exclusive 1 WALWriteLock Exclusive 1 SInvalLock Shared 20 CLogControlLock Shared 534 CLogControlLock Exclusive 845 XidGenLock Exclusive1140 ProcArrayLock Shared1821 WALInsertLock Exclusive 17043 ProcArrayLock Exclusive 49762 Lock IdMode Combined Time (ns) SInvalLock Exclusive17216 SInvalLock Shared 531129 WALWriteLock Exclusive 2003064 CLogControlLock Shared 61325819 CLogControlLock Exclusive 73509195 XidGenLock Exclusive929477929 WALInsertLock Exclusive 17941476941 ProcArrayLock Shared 31114676303 ProcArrayLock Exclusive 888356047549 Regards, Staale Smedseng Database Technology Group, Sun Microsystems ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why are we waiting?
> What is the frequency distribution of lock wait time on ProcArrayLock? See below for wait time distributions for ProcArrayLock (both shared and exclusive). The time measured is from entry into LWLockAcquire() until return. I've recorded the same data in two different resolutions (ms, and us for the lower part of the distribution). The DTrace script is at the bottom. These results are for 1000 TPC-C like clients, and measured over the 1000 PostgreSQL processes over a period of 10 seconds. > Does the distribution vary over time? Hmm... I will have to get back to you on that one. Staale CPU IDFUNCTION:NAME 6 71245 :tick-10sec Total LW_EXCLUSIVE25178 Total Transaction Starts 25679 Total LW_SHARED 107211 LW_SHARED [ms] value - Distribution - count < 0 | 0 0 |@@ 100565 10 | 1 20 | 0 30 | 0 40 | 0 50 | 0 60 | 0 70 | 6 80 | 304 90 |@1370 100 |@2685 110 |@1731 120 | 307 130 | 13 140 | 0 150 | 0 160 | 5 170 | 4 180 | 26 190 | 36 200 | 24 210 | 61 220 | 49 230 | 15 240 | 0 250 | 0 260 | 0 270 | 0 280 | 2 290 | 2 300 | 0 310 | 2 320 | 1 330 | 1 340 | 0 350 | 0 360 | 0 370 | 0 380 | 0 390 | 0 400 | 1 410 | 0 LW_EXCLUSIVE [ms] value - Distribution - count < 0 | 0 0 |@@ 1565 10 | 0 20 | 0 30 | 0 40 | 0 50 | 0 60 | 0 70 | 16 80 |@894 90 |@@@ 4108 100 |@8090 110 | 4
Re: [HACKERS] Why are we waiting?
I'm not sure 32-bit and 64-bit cases are going to be directly comparable. We could have a problem with cache line aliasing on only one or the other for example. Agreed, this is likely comparing apples and oranges. I'll see if I can get a one-to-one comparison done (these were the numbers I had close by when writing the email). But that is a pretty striking difference. Does the 8.3 run complete more transactions in that time? I'll make sure to include those numbers as well. :) Staale ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Why are we waiting?
On Thu, 2008-02-07 at 18:12, Simon Riggs wrote: > I just realised you are using a lookup to get the text for the name of > the lock. You used the same lookup table for both releases? Oh, it wasn't quite that bad. :-) The two DTrace scripts had been revised to correspond with the two different declarations of LWLockId in lwlock.h (for 8.2.5 and 8.3 respectively). But somehow the value for the last lock in our lookup table corresponding to FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS, in the enum got turned into n+1 instead of n+NUM_BUFFER_PARTITIONS. The other locks should have been output correctly, however. But as Tom pointed out, the dynamic locks were not in the equation. So now we're measuring all lock waits instead of assuming. :-) Staale ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Why are we waiting?
On Wed, 2008-02-06 at 19:55, Tom Lane wrote: > I am wondering if the waits are being > attributed to the right locks --- I remember such an error in a previous > set of dtrace results, and some of the other details such as claiming > shared lock delays but no exclusive lock delays for FirstLockMgrLock > seem less than credible as well. Good catch. We've checked the DTrace scripts against the respective versions of lwlock.h, and the FirstLockMgrLock is off (this is actually the results for FirstBufMappingLock). However, this is the last lock in the enum that we trace, the other lower-numbered lock enums were correctly mapped. (In particular the ProcArrayLock which we've been puzzled by.) And also, there was no clog buffer twaking or pg_lock monitoring done when this benchmark was run, AFAIK. We'll redo the benchmarks and post new scripts and results. Staale ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match