Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
On Mon, 2008-02-04 at 19:46, Simon Riggs wrote:
> We've got various facilities now for looking at LWLock waits, but I'd
> like to have more information about the *reasons* for lock waits.
> 
> I know its possible to get backtraces in Dtrace at various tracepoints
> but that can be fairly hard to interpret.

Simon,

A couple of guys here at Sun have started looking at trying to improve
the scaling on Sun's T2000 Niagara servers (single core, 32 HW-threads).

We mostly use various DTrace scripts to monitor locking usage. We'd be
happy to share results, probes and DTrace scripts for monitoring if you
like.

So far we've found lock contention (especially for the ProcArrayLock) to
be the likely source for our inability to saturate the CPU with a TPC-C
like OLTP load with >1000 users.

The following shows average time spent in and waiting for exclusive
ProcArrayLock vs the time used in the transaction through commit, (i.e.,
up until the exclusive ProcArrayLock acquire to update the PGPROC
setting transaction no longer running):

  # ./pg-lwlock-procarray.d $(pgrep -n postgres)

  postgres`LWLockAcquire
  postgres`ProcArrayEndTransaction+0x10
  postgres`CommitTransaction+0xf0
  postgres`CommitTransactionCommand+0x90
  postgres`finish_xact_command+0x60
  postgres`exec_execute_message+0x3b4
  postgres`PostgresMain+0x13a0
  postgres`BackendRun+0x27c
  postgres`BackendStartup+0xe0
  postgres`ServerLoop+0x1a0
  postgres`PostmasterMain+0xcbc
  postgres`main+0x1d8
  postgres`_start+0x108
   23
  avg lwlock acquire service time [ns] 193051989
  transaction-commit [count]  23
  transaction-start [count]   23
  avg transaction time [ns] 12763079

The stack trace shows that the only time the lock is acquired
exclusively is from the call to ProcArrayEndTransaction() in
CommitTransaction().

Also, an interesting observation is that the hot locks seem to have
changed from v8.2 to v8.3, making the ProcArrayLock more contended. See
the following outputs:

PostgreSQL 8.2 (32-bit):

  -bash-3.00# ./825_lwlock_wait.d

 Lock IdMode   Count
FirstLockMgrLock  Shared   2
   ProcArrayLock  Shared 209
  XidGenLock   Exclusive1030
  XidGenLock  Shared1215
   WALInsertLock   Exclusive3942
 CLogControlLock  Shared4113
   ProcArrayLock   Exclusive6929
WALWriteLock   Exclusive   17155
 CLogControlLock   Exclusive  128182

 Lock Id   Combined Time (ns)
FirstLockMgrLock24705
   WALInsertLock 79644210
  XidGenLock179886846
   ProcArrayLock   1486950738
WALWriteLock  18425400504
 CLogControlLock1507388036453



PostgreSQL 8.3 (64-bit):

  -bash-3.00# ./83_lwlock_wait.d

 Lock IdMode   Count
  SInvalLock   Exclusive   1
WALWriteLock   Exclusive   1
  SInvalLock  Shared  20
 CLogControlLock  Shared 534
 CLogControlLock   Exclusive 845
  XidGenLock   Exclusive1140
   ProcArrayLock  Shared1821
   WALInsertLock   Exclusive   17043
   ProcArrayLock   Exclusive   49762

 Lock IdMode   Combined Time (ns)
  SInvalLock   Exclusive17216
  SInvalLock  Shared   531129
WALWriteLock   Exclusive  2003064
 CLogControlLock  Shared 61325819
 CLogControlLock   Exclusive 73509195
  XidGenLock   Exclusive929477929
   WALInsertLock   Exclusive  17941476941
   ProcArrayLock  Shared  31114676303
   ProcArrayLock   Exclusive     888356047549



Regards,
Staale Smedseng
Database Technology Group, Sun Microsystems



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
> What is the frequency distribution of lock wait time on ProcArrayLock?

See below for wait time distributions for ProcArrayLock (both shared and
exclusive). The time measured is from entry into LWLockAcquire() until
return. I've recorded the same data in two different resolutions (ms,
and us for the lower part of the distribution). The DTrace script is at
the bottom.

These results are for 1000 TPC-C like clients, and measured over the
1000 PostgreSQL processes over a period of 10 seconds.

> Does the distribution vary over time?

Hmm... I will have to get back to you on that one.

Staale

CPU IDFUNCTION:NAME
  6  71245  :tick-10sec 

  Total LW_EXCLUSIVE25178
  Total Transaction Starts  25679
  Total LW_SHARED  107211
  LW_SHARED [ms]
   value  - Distribution - count
 < 0 | 0
   0 |@@   100565   
  10 | 1
  20 | 0
  30 | 0
  40 | 0
  50 | 0
  60 | 0
  70 | 6
  80 | 304  
  90 |@1370 
 100 |@2685 
 110 |@1731 
 120 | 307  
 130 | 13   
 140 | 0
 150 | 0
 160 | 5
 170 | 4
 180 | 26   
 190 | 36   
 200 | 24   
 210 | 61   
 220 | 49   
 230 | 15   
 240 | 0
 250 | 0
 260 | 0
 270 | 0
 280 | 2
 290 | 2
 300 | 0
 310 | 2
 320 | 1
 330 | 1
 340 | 0
 350 | 0
 360 | 0
 370 | 0
 380 | 0
 390 | 0
 400 | 1
 410 | 0

  LW_EXCLUSIVE [ms] 
   value  - Distribution - count
 < 0 | 0
   0 |@@   1565 
  10 | 0
  20 | 0
  30 | 0
  40 | 0
  50 | 0
  60 | 0
  70 | 16   
  80 |@894  
  90 |@@@  4108 
 100 |@8090 
 110 | 4

Re: [HACKERS] Why are we waiting?

2008-02-06 Thread Staale Smedseng
I'm not sure 32-bit and 64-bit cases are going to be directly  
comparable. We
could have a problem with cache line aliasing on only one or the  
other for

example.


Agreed, this is likely comparing apples and oranges. I'll see if I can  
get a one-to-one comparison done (these were the numbers I had close  
by when writing the email).


But that is a pretty striking difference. Does the 8.3 run complete  
more

transactions in that time?


I'll make sure to include those numbers as well. :)

Staale

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Staale Smedseng
On Thu, 2008-02-07 at 18:12, Simon Riggs wrote:

> I just realised you are using a lookup to get the text for the name of
> the lock. You used the same lookup table for both releases?

Oh, it wasn't quite that bad. :-) The two DTrace scripts had been
revised to correspond with the two different declarations of LWLockId in
lwlock.h (for 8.2.5 and 8.3 respectively). But somehow the value for the
last lock in our lookup table corresponding to

  FirstLockMgrLock = FirstBufMappingLock + NUM_BUFFER_PARTITIONS,

in the enum got turned into n+1 instead of n+NUM_BUFFER_PARTITIONS. The
other locks should have been output correctly, however.

But as Tom pointed out, the dynamic locks were not in the equation. So
now we're measuring all lock waits instead of assuming. :-)

Staale



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Why are we waiting?

2008-02-07 Thread Staale Smedseng
On Wed, 2008-02-06 at 19:55, Tom Lane wrote:
> I am wondering if the waits are being
> attributed to the right locks --- I remember such an error in a previous
> set of dtrace results, and some of the other details such as claiming
> shared lock delays but no exclusive lock delays for FirstLockMgrLock
> seem less than credible as well.

Good catch. We've checked the DTrace scripts against the respective
versions of lwlock.h, and the FirstLockMgrLock is off (this is actually
the results for FirstBufMappingLock). 

However, this is the last lock in the enum that we trace, the other
lower-numbered lock enums were correctly mapped. (In particular the
ProcArrayLock which we've been puzzled by.)

And also, there was no clog buffer twaking or pg_lock monitoring done
when this benchmark was run, AFAIK.

We'll redo the benchmarks and post new scripts and results.

Staale



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match