On 07/24/2011 05:55 PM, Tom Lane wrote:
> Stefan Kaltenbrunner writes:
>> interesting - iirc we actually had some reports about current libpq
>> behaviour causing scaling issues on some OSes - see
>> http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and
>> some related threads. Iir
Stefan Kaltenbrunner writes:
> interesting - iirc we actually had some reports about current libpq
> behaviour causing scaling issues on some OSes - see
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg00748.php and
> some related threads. Iirc the final patch for that was never applied
>
Jeff Janes writes:
> How was this profile generated? I get a similar profile using
> --enable-profiling and gprof, but I find it not believable. The
> complete absence of any calls to libpq is not credible. I don't know
> about your profiler, but with gprof they should be listed in the call
> g
On 07/24/2011 03:50 AM, Jeff Janes wrote:
> On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner
> wrote:
>> On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote:
>>
>> [...]
>>
>>> all those tests are done with pgbench running on the same box - which
>>> has a noticable impact on the results becau
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner
wrote:
> On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote:
>
> [...]
>
>> all those tests are done with pgbench running on the same box - which
>> has a noticable impact on the results because pgbench is using ~1 core
>> per 8 cores of the ba
On Jun12, 2011, at 23:39 , Robert Haas wrote:
> So, the majority (60%) of the excess spinning appears to be due to
> SInvalReadLock. A good chunk are due to ProcArrayLock (25%).
Hm, sizeof(LWLock) is 24 on X86-64, making sizeof(LWLockPadded) 32.
However, cache lines are 64 bytes large on recent I
On 06/14/2011 02:27 AM, Jeff Janes wrote:
> On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner
> wrote:
> ...
>>
>>
>> so it seems that sysbench is actually significantly less overhead than
>> pgbench and the lower throughput at the higher conncurency seems to be
>> cause by sysbench being able
On Mon, Jun 13, 2011 at 9:09 PM, Alvaro Herrera
wrote:
> I noticed that pgbench's doCustom (the function highest in the profile
> posted) returns doing nothing if the connection is supposed to be
> "sleeping"; seems an open door for busy waiting. I didn't check the
> rest of the code to see if t
On Mon, Jun 13, 2011 at 8:10 PM, Jeff Janes wrote:
> On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas wrote:
> ...
>>
>> Profiling reveals that the system spends enormous amounts of CPU time
>> in s_lock. LWLOCK_STATS reveals that the only lwlock with significant
>> amounts of blocking is the BufFre
On Tue, Jun 14, 2011 at 13:09, Alvaro Herrera
wrote:
> I noticed that pgbench's doCustom (the function highest in the profile
> posted) returns doing nothing if the connection is supposed to be
> "sleeping"; seems an open door for busy waiting.
pgbench uses select() with/without timeout in the ca
Excerpts from Jeff Janes's message of lun jun 13 20:27:15 -0400 2011:
> On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner
> wrote:
> ...
> >
> >
> > so it seems that sysbench is actually significantly less overhead than
> > pgbench and the lower throughput at the higher conncurency seems to be
On 06/13/2011 07:55 AM, Stefan Kaltenbrunner wrote:
all those tests are done with pgbench running on the same box - which
has a noticable impact on the results because pgbench is using ~1 core
per 8 cores of the backend tested in cpu resoures - though I don't think
it causes any changes in the re
On 06/13/2011 08:27 PM, Jeff Janes wrote:
pgbench sends each query (per connection) and waits for the reply
before sending another.
Do we know whether sysbench does that, or if it just stuffs the
kernel's IPC buffer full of queries without synchronously waiting for
individual replies?
sysb
On Tue, Jun 14, 2011 at 09:27, Jeff Janes wrote:
> pgbench sends each query (per connection) and waits for the reply
> before sending another.
We can use -j option to run pgbench in multiple threads to avoid
request starvation. What setting did you use, Stefan?
>> for those curious - the profile
On Mon, Jun 13, 2011 at 7:03 AM, Stefan Kaltenbrunner
wrote:
...
>
>
> so it seems that sysbench is actually significantly less overhead than
> pgbench and the lower throughput at the higher conncurency seems to be
> cause by sysbench being able to stress the backend even more than
> pgbench can.
On Sun, Jun 12, 2011 at 2:39 PM, Robert Haas wrote:
...
>
> Profiling reveals that the system spends enormous amounts of CPU time
> in s_lock. LWLOCK_STATS reveals that the only lwlock with significant
> amounts of blocking is the BufFreelistLock;
This is curious. Clearly the entire working set
On Mon, Jun 13, 2011 at 10:29 AM, Tom Lane wrote:
> Stefan Kaltenbrunner writes:
>> On 06/12/2011 11:39 PM, Robert Haas wrote:
>>> Profiling reveals that the system spends enormous amounts of CPU time
>>> in s_lock.
>
>> just to reiterate that with numbers - at 160 threads with both patches
>> ap
Stefan Kaltenbrunner writes:
> On 06/12/2011 11:39 PM, Robert Haas wrote:
>> Profiling reveals that the system spends enormous amounts of CPU time
>> in s_lock.
> just to reiterate that with numbers - at 160 threads with both patches
> applied the profile looks like:
> samples %image
On 06/13/2011 01:55 PM, Stefan Kaltenbrunner wrote:
[...]
> all those tests are done with pgbench running on the same box - which
> has a noticable impact on the results because pgbench is using ~1 core
> per 8 cores of the backend tested in cpu resoures - though I don't think
> it causes any cha
On 06/12/2011 11:39 PM, Robert Haas wrote:
> Here is a patch that applies over the "reducing the overhead of
> frequent table locks" (fastlock-v3) patch and allows heavyweight VXID
> locks to spring into existence only when someone wants to wait on
> them. I believe there is a large benefit to be
On 06/13/2011 02:29 PM, Kevin Grittner wrote:
> Stefan Kaltenbrunner wrote:
>
>> on that particular 40cores/80 threads box:
>
>> unpatched:
>
>> c40:tps = 107689.945323 (including connections establishing)
>> c80:tps = 101885.549081 (including connections establishing)
>
>> fast lo
Stefan Kaltenbrunner wrote:
> on that particular 40cores/80 threads box:
> unpatched:
> c40:tps = 107689.945323 (including connections establishing)
> c80:tps = 101885.549081 (including connections establishing)
> fast locks:
> c40:tps = 215807.263233 (including connections e
On 06/12/2011 11:39 PM, Robert Haas wrote:
> Here is a patch that applies over the "reducing the overhead of
> frequent table locks" (fastlock-v3) patch and allows heavyweight VXID
> locks to spring into existence only when someone wants to wait on
> them. I believe there is a large benefit to be
On Sun, Jun 12, 2011 at 5:58 PM, Greg Stark wrote:
> On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas wrote:
>> I hacked up the system to
>> report how often each lwlock spinlock exceeded spins_per_delay.
>
> I don't doubt the rest of your analysis but one thing to note, number
> of spins on a spinl
On Sun, Jun 12, 2011 at 10:39 PM, Robert Haas wrote:
> I hacked up the system to
> report how often each lwlock spinlock exceeded spins_per_delay.
I don't doubt the rest of your analysis but one thing to note, number
of spins on a spinlock is not the same as the amount of time spent
waiting for i
Here is a patch that applies over the "reducing the overhead of
frequent table locks" (fastlock-v3) patch and allows heavyweight VXID
locks to spring into existence only when someone wants to wait on
them. I believe there is a large benefit to be had from this
optimization, because the combination
26 matches
Mail list logo