Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Stefan Kaltenbrunner
On 07/24/2011 05:55 PM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> interesting - iirc we actually had some reports about current libpq >> behaviour causing scaling issues on some OSes - see >> http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and >> some related threads. Iir

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Tom Lane
Stefan Kaltenbrunner writes: > interesting - iirc we actually had some reports about current libpq > behaviour causing scaling issues on some OSes - see > http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and > some related threads. Iirc the final patch for that was never applied >

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Tom Lane
Jeff Janes writes: > How was this profile generated? I get a similar profile using > --enable-profiling and gprof, but I find it not believable. The > complete absence of any calls to libpq is not credible. I don't know > about your profiler, but with gprof they should be listed in the call > g

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-24 Thread Stefan Kaltenbrunner
On 07/24/2011 03:50 AM, Jeff Janes wrote: > On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner > wrote: >> On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: >> >> [...] >> >>> all those tests are done with pgbench running on the same box - which >>> has a noticable impact on the results becau

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-07-23 Thread Jeff Janes
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner wrote: > On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: > > [...] > >> all those tests are done with pgbench running on the same box - which >> has a noticable impact on the results because pgbench is using ~1 core >> per 8 cores of the ba

Re: [HACKERS] lazy vxid locks, v1

2011-06-22 Thread Florian Pflug
On Jun12, 2011, at 23:39 , Robert Haas wrote: > So, the majority (60%) of the excess spinning appears to be due to > SInvalReadLock. A good chunk are due to ProcArrayLock (25%). Hm, sizeof(LWLock) is 24 on X86-64, making sizeof(LWLockPadded) 32. However, cache lines are 64 bytes large on recent I

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-14 Thread Stefan Kaltenbrunner
On 06/14/2011 02:27 AM, Jeff Janes wrote: > On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner > wrote: > ... >> >> >> so it seems that sysbench is actually significantly less overhead than >> pgbench and the lower throughput at the higher conncurency seems to be >> cause by sysbench being able

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-14 Thread Jeff Janes
On Mon, Jun 13, 2011 at 9:09 PM, Alvaro Herrera wrote: > I noticed that pgbench's doCustom (the function highest in the profile > posted) returns doing nothing if the connection is supposed to be > "sleeping"; seems an open door for busy waiting.  I didn't check the > rest of the code to see if t

Re: [HACKERS] lazy vxid locks, v1

2011-06-14 Thread Robert Haas
On Mon, Jun 13, 2011 at 8:10 PM, Jeff Janes wrote: > On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas wrote: > ... >> >> Profiling reveals that the system spends enormous amounts of CPU time >> in s_lock.  LWLOCK_STATS reveals that the only lwlock with significant >> amounts of blocking is the BufFre

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 13:09, Alvaro Herrera wrote: > I noticed that pgbench's doCustom (the function highest in the profile > posted) returns doing nothing if the connection is supposed to be > "sleeping"; seems an open door for busy waiting. pgbench uses select() with/without timeout in the ca

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Alvaro Herrera
Excerpts from Jeff Janes's message of lun jun 13 20:27:15 -0400 2011: > On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner > wrote: > ... > > > > > > so it seems that sysbench is actually significantly less overhead than > > pgbench and the lower throughput at the higher conncurency seems to be

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Greg Smith
On 06/13/2011 07:55 AM, Stefan Kaltenbrunner wrote: all those tests are done with pgbench running on the same box - which has a noticable impact on the results because pgbench is using ~1 core per 8 cores of the backend tested in cpu resoures - though I don't think it causes any changes in the re

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Greg Smith
On 06/13/2011 08:27 PM, Jeff Janes wrote: pgbench sends each query (per connection) and waits for the reply before sending another. Do we know whether sysbench does that, or if it just stuffs the kernel's IPC buffer full of queries without synchronously waiting for individual replies? sysb

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Itagaki Takahiro
On Tue, Jun 14, 2011 at 09:27, Jeff Janes wrote: > pgbench sends each query (per connection) and waits for the reply > before sending another. We can use -j option to run pgbench in multiple threads to avoid request starvation. What setting did you use, Stefan? >> for those curious - the profile

Re: pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Jeff Janes
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner wrote: ... > > > so it seems that sysbench is actually significantly less overhead than > pgbench and the lower throughput at the higher conncurency seems to be > cause by sysbench being able to stress the backend even more than > pgbench can.

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Jeff Janes
On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas wrote: ... > > Profiling reveals that the system spends enormous amounts of CPU time > in s_lock.  LWLOCK_STATS reveals that the only lwlock with significant > amounts of blocking is the BufFreelistLock; This is curious. Clearly the entire working set

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Robert Haas
On Mon, Jun 13, 2011 at 10:29 AM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> On 06/12/2011 11:39 PM, Robert Haas wrote: >>> Profiling reveals that the system spends enormous amounts of CPU time >>> in s_lock. > >> just to reiterate that with numbers - at 160 threads with both patches >> ap

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Tom Lane
Stefan Kaltenbrunner writes: > On 06/12/2011 11:39 PM, Robert Haas wrote: >> Profiling reveals that the system spends enormous amounts of CPU time >> in s_lock. > just to reiterate that with numbers - at 160 threads with both patches > applied the profile looks like: > samples %image

pgbench cpu overhead (was Re: [HACKERS] lazy vxid locks, v1)

2011-06-13 Thread Stefan Kaltenbrunner
On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote: [...] > all those tests are done with pgbench running on the same box - which > has a noticable impact on the results because pgbench is using ~1 core > per 8 cores of the backend tested in cpu resoures - though I don't think > it causes any cha

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/12/2011 11:39 PM, Robert Haas wrote: > Here is a patch that applies over the "reducing the overhead of > frequent table locks" (fastlock-v3) patch and allows heavyweight VXID > locks to spring into existence only when someone wants to wait on > them. I believe there is a large benefit to be

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/13/2011 02:29 PM, Kevin Grittner wrote: > Stefan Kaltenbrunner wrote: > >> on that particular 40cores/80 threads box: > >> unpatched: > >> c40:tps = 107689.945323 (including connections establishing) >> c80:tps = 101885.549081 (including connections establishing) > >> fast lo

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Kevin Grittner
Stefan Kaltenbrunner wrote: > on that particular 40cores/80 threads box: > unpatched: > c40:tps = 107689.945323 (including connections establishing) > c80:tps = 101885.549081 (including connections establishing) > fast locks: > c40:tps = 215807.263233 (including connections e

Re: [HACKERS] lazy vxid locks, v1

2011-06-13 Thread Stefan Kaltenbrunner
On 06/12/2011 11:39 PM, Robert Haas wrote: > Here is a patch that applies over the "reducing the overhead of > frequent table locks" (fastlock-v3) patch and allows heavyweight VXID > locks to spring into existence only when someone wants to wait on > them. I believe there is a large benefit to be

Re: [HACKERS] lazy vxid locks, v1

2011-06-12 Thread Robert Haas
On Sun, Jun 12, 2011 at 5:58 PM, Greg Stark wrote: > On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas wrote: >> I hacked up the system to >> report how often each lwlock spinlock exceeded spins_per_delay. > > I don't doubt the rest of your analysis but one thing to note, number > of spins on a spinl

Re: [HACKERS] lazy vxid locks, v1

2011-06-12 Thread Greg Stark
On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas wrote: > I hacked up the system to > report how often each lwlock spinlock exceeded spins_per_delay. I don't doubt the rest of your analysis but one thing to note, number of spins on a spinlock is not the same as the amount of time spent waiting for i

[HACKERS] lazy vxid locks, v1

2011-06-12 Thread Robert Haas
Here is a patch that applies over the "reducing the overhead of frequent table locks" (fastlock-v3) patch and allows heavyweight VXID locks to spring into existence only when someone wants to wait on them. I believe there is a large benefit to be had from this optimization, because the combination