Re: [PERFORM] Best way to load test a postgresql server

2009-06-03 Thread Dimitri Fontaine
"Kenneth Cox"  writes:
> On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
>  wrote:
>> I'd recommand having a look at tsung which will be able to replay a
>> typical application scenario with as many concurrent users as you want
>> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>>   http://tsung.erlang-projects.org/
>>   http://pgfouine.projects.postgresql.org/tsung.html
>
> I am having a look at tsung and not getting very far yet.  Have you had luck
> with it and do you really mean as many concurrent users as you want?

Last time I used it it was in the context of a web application and to
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.

Tsung is based on erlang and can be run from more than one node at any
time, last time I checked you could run 600 to 800 concurrent clients
from each node. Recent versions of erlang allow a much greater number
per node, one or two orders of magnitude greater, as I've been told by
Tsung's main developer.

>   I was
> hoping to use it to simulate my current load while tuning and making
> improvements.  So far tsung doesn't appear well suited to my needs.  I use
> persistent connections; each tsung session uses a new connection.  I have
> multiple applications that have very usage patterns (some web and largely
> idle, some non web and almost saturated); tsung has virtual users choosing
> a session based on a probability with think times.  I know many  programming
> languages; tsung (and its error messages) is in erlang.

Tsung can be setup as an http or postgresql proxy: in this mode it'll
prepare session files for you while you use your application as
usual. The thinktime it sees will then get randomized at run time to
better reflect real usage.

You can define several user arrival phases to see what happens when the
load raises then get back to normal traffic. Lots of options, really.

Tsung generates statistics and comes with tools to analyze them and
provide graphs organized into a web page, one of those tools allow to
draw graphs from different simulations onto the same chart, with the
same scaling, in order to easily compare results.

It seems to me tsung is a good tool for your use case.

Regards,
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread James Mansion

Greg Smith wrote:
3500 active connections across them.  That doesn't work, and what 
happens
is exactly the sort of context switch storm you're showing data for. 
Think about it for a minute:  how many of those can really be doing 
work at any time?  32, that's how many.  Now, you need some multiple 
of the number of cores to try to make sure everybody is always busy, 
but that multiple should be closer to 10X the number of cores rather 
than 100X. 

That's surely overly simplistic.  There is inherently nothing problematic
about having a lot of compute processes waiting for their timeslice, nor
of having IO- or semaphore-blocked processes waiting, and it doesn't
cause a context switch storm - this is a problem with postgres scalability,
not (inherently) lots of connections. I'm sure most of us evaluating 
Postgres

from a background in Sybase or SQLServer would regard 5000
connections as no big deal.

This has the sniff of a badly contended spin-and-yield doesn't it?

I'd guess that if the yield were a sleep for a couple of milliseconds then
the lock holder would run an free everything up.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best way to load test a postgresql server

2009-06-03 Thread Shaul Dar
I considered Tsung myself but haven't tried it. If you intend to, I suggest
you read this excellent tutorial on using Tsung for test-loading
Postgresql.
While impressed I decided the procedure was too daunting and went with
JMeter :-) It too can run test from multiple clients and has built in tables
and graphs and you can save results as CSV or XML etc. In particular I
recommend adding the extenion "listener" (JMeter term for anything that
captures and portrays test results) called Statitical Aggregate Report.

May the force be with you,

-- Shaul

On Wed, Jun 3, 2009 at 12:29 PM, Dimitri Fontaine wrote:

> "Kenneth Cox"  writes:
> > On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> >  wrote:
> >> I'd recommand having a look at tsung which will be able to replay a
> >> typical application scenario with as many concurrent users as you want
> >> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
> >>   http://tsung.erlang-projects.org/
> >>   http://pgfouine.projects.postgresql.org/tsung.html
> >
> > I am having a look at tsung and not getting very far yet.  Have you had
> luck
> > with it and do you really mean as many concurrent users as you want?
>
> Last time I used it it was in the context of a web application and to
> compare PostgreSQL against Informix after a migration. So I used the
> HTTP protocol support of the injector.
>
> Tsung is based on erlang and can be run from more than one node at any
> time, last time I checked you could run 600 to 800 concurrent clients
> from each node. Recent versions of erlang allow a much greater number
> per node, one or two orders of magnitude greater, as I've been told by
> Tsung's main developer.
>
> >   I was
> > hoping to use it to simulate my current load while tuning and making
> > improvements.  So far tsung doesn't appear well suited to my needs.  I
> use
> > persistent connections; each tsung session uses a new connection.  I have
> > multiple applications that have very usage patterns (some web and largely
> > idle, some non web and almost saturated); tsung has virtual users
> choosing
> > a session based on a probability with think times.  I know many
>  programming
> > languages; tsung (and its error messages) is in erlang.
>
> Tsung can be setup as an http or postgresql proxy: in this mode it'll
> prepare session files for you while you use your application as
> usual. The thinktime it sees will then get randomized at run time to
> better reflect real usage.
>
> You can define several user arrival phases to see what happens when the
> load raises then get back to normal traffic. Lots of options, really.
>
> Tsung generates statistics and comes with tools to analyze them and
> provide graphs organized into a web page, one of those tools allow to
> draw graphs from different simulations onto the same chart, with the
> same scaling, in order to easily compare results.
>
> It seems to me tsung is a good tool for your use case.
>
> Regards,
> --
> dim
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Best way to load test a postgresql server

2009-06-03 Thread Kenneth Cox
On Wed, 03 Jun 2009 05:29:02 -0400, Dimitri Fontaine  
 wrote:

Last time I used it it was in the context of a web application and to
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.


Tsung seems well suited for that.


Tsung is based on erlang...you could run 600 to 800 concurrent clients
from each node.


But each tsung session (virtual user) uses a separate PG connection, and I  
need 30k virtual users.  I can't imagine 30k PG connections.  I could  
imagine using pgbouncer in statement pooling mode, but that doesn't  
characterize my load well, where different PG connections have different  
profiles.  I have about 500 connections:


   ~450 from web servers, often idle, various work loads, no prepared  
statements

 50 from another client, mostly idle, small set of prepared statements
 10 from another client, extremely active, small set of prepared  
statements


I know a tsung session doesn't have to exactly mimic a user and I tried to  
coerce a tsung session to represent instead a DB client, with loops and  
multiple CSV files.  I wasn't so successful there, and was nagged by the  
assignment of sessions by probability, when I wanted a fixed number  
running each session.


I do appreciate the suggestions, and I agree Tsung has lots of nifty  
features.  I used pgfouine to generate tsung sessions I love the graph  
generation but for me it comes down to simulating my DB load so that I can  
profile and tune the DB.  I am not seeing how to get tsung to fit my case.


Next up I will try JMeter (thanks Shaul Dar for the suggestions).

Regards,
Ken

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Kevin Grittner
James Mansion  wrote: 
 
> I'm sure most of us evaluating Postgres from a background in Sybase
> or SQLServer would regard 5000 connections as no big deal.
 
Sure, but the architecture of those products is based around all the
work being done by "engines" which try to establish affinity to
different CPUs, and loop through the various tasks to be done.  You
don't get a context switch storm because you normally have the number
of engines set at or below the number of CPUs.  The down side is that
they spend a lot of time spinning around queue access to see if
anything has become available to do -- which causes them not to play
nice with other processes on the same box.
 
If you do connection pooling and queue requests, you get the best of
both worlds.  If that could be built into PostgreSQL, it would
probably reduce the number of posts requesting support for bad
configurations, and help with benchmarks which don't use proper
connection pooling for the product; but it would actually not add any
capability which isn't there if you do your homework
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] poor performing plan from analyze vs. fast default plan pre-analyze on new database

2009-06-03 Thread Davin Potts
Hi all --

In attempting to perform a particular query in postgres, I ran into
what appeared to be a pretty severe performance bottleneck.  I didn't
know whether I'd constructed my query stupidly, or I'd misconfigured
postgres in some suboptimal way, or what else might be going on.
Though I very much wanted to know the underlying cause of this
performance issue on my query, I was also facing the constraint that I
was working with a system in production and the tables my query
operated on could not be taken offline.  So I did the sensible thing
and captured a snapshot of the current production system via simple
pg_dump and played it back into a different postgres installation.
The snapshot copy was installed onto a "development system" with a
Pentium D class processor running postgres 8.1 on CentOS 4 32-bit
whereas the production copy was on a "production system" with a Core 2
Duo class processor running postgres 8.3 on OpenSUSE 10.2 64-bit.

I resubmitted a modified version of the query that'd given me problems
before, this time giving the same query to both systems at the same
time.  To my surprise, the query finished successfully on the
development system in under a minute.  I let the query continue to run
on the production system while I verified the results from the query
on the development system -- everything checked out.  It took more
than a day and a half for the query to complete on the production
system.  Granted, the production system was continuing to see inserts
to the relevant table while my query was running and granted that the
size of the production system's table had grown since the snapshot
(but not more than 50% larger), but I found it very difficult to
accept these conditions explaining such a massive difference in
performance of my query.  Here's the query and the response to
console, including \timing info, as submitted on the development
system:
smirkfp=# insert into dupids ( id ) select id from content where id
not in (select min(id) from content group by hash);
INSERT 0 437391
Time: 25733.394 ms

Here is the same query with the very different timing results seen on
the production system:
smirkfp=# insert into dupids ( id ) select id from content where id
not in (select min(id) from content group by hash);
INSERT 0 441592
Time: 142622702.430 ms

A little more background:  The table of interest, content, has around
1.5M rows on the production system and around 1.1M rows on the
development system at the time this query was run.  On both systems,
the smirkfp databases are centered around this table, content, and
have no other large tables or activities of interest outside this
table.  The database is sufficiently new that no time had been taken
to vacuum or analyze anything previously.  Neither the development nor
production system had noteworthy processor load or disk activity
outside postgres.  On the production system, the above long-running
query was pegging one of the processor cores at ~100% for almost the
whole time of the query's execution.

On the development system, I asked postgres to show me the execution
plan for my query:
smirkfp=# explain insert into dupids ( id ) select id from content
where id not in (select min(id) from content group by hash);
                                  QUERY PLAN
-
 Seq Scan on content  (cost=204439.55..406047.33 rows=565752 width=4)
  Filter: (NOT (hashed subplan))
  SubPlan
    ->  HashAggregate  (cost=204436.55..204439.05 rows=200 width=36)
          ->  Seq Scan on content  (cost=0.00..198779.03 rows=1131503 width=36)
(5 rows)


For comparison, I asked the same thing of the production system:
smirkfp=# explain insert into dupids ( id ) select id from content
where id not in (select min(id) from content group by hash);
                                        QUERY PLAN
-
 Seq Scan on content  (cost=493401.85..9980416861.63 rows=760071 width=4)
  Filter: (NOT (subplan))
  SubPlan
    ->  Materialize  (cost=493401.85..504915.85 rows=646400 width=37)
          ->  GroupAggregate  (cost=468224.39..487705.45 rows=646400 width=37)
                ->  Sort  (cost=468224.39..472024.74 rows=1520142 width=37)
                      Sort Key: public.content.hash
                      ->  Seq Scan on content  (cost=0.00..187429.42
rows=1520142 width=37)
(8 rows)


Looks pretty different.  Next, I thought I'd try asking the
development system to analyze the table, content, and see if that
changed anything:
smirkfp=# analyze content;
ANALYZE
smirkfp=# explain insert into dupids ( id ) select id from content
where id not in (select min(id) from content group by hash);
                                        QUERY PLAN
-
 Seq Scan on content  (cost=480291.35..7955136280.55 rows=582888 width=4)
  F

Re: [PERFORM] poor performing plan from analyze vs. fast default plan pre-analyze on new database

2009-06-03 Thread Grzegorz Jaśkiewicz
Postgresql isn't very efficient with subselects like that,
try:
explain select c.id from content c LEFT JOIN (select min(id) AS id
from content group by hash) cg ON cg.id=c.id WHERE cg.id is null;

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] poor performing plan from analyze vs. fast default plan pre-analyze on new database

2009-06-03 Thread Tom Lane
Davin Potts  writes:
> How to approach manipulating the execution plan back to something more
> efficient?  What characteristics of the table could have induced
> analyze to suggest the much slower query plan?

What's evidently happening is that the planner is backing off from using
a hashed subplan because it thinks the hashtable will require more than
work_mem.  Is 646400 a reasonably good estimate of the number of rows
that the sub-select will produce?  If it's a large overestimate, then
perhaps increasing the stats target for content.hash will help.  If
it's good, then what you want to do is increase work_mem to allow the
planner to use the better plan.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] poor performing plan from analyze vs. fast default plan pre-analyze on new database

2009-06-03 Thread Scott Carey

On 6/3/09 8:42 AM, "Davin Potts"  wrote:

> Hi all --
> 
> 
> A little more background:  The table of interest, content, has around
> 1.5M rows on the production system and around 1.1M rows on the
> development system at the time this query was run.  On both systems,
> the smirkfp databases are centered around this table, content, and
> have no other large tables or activities of interest outside this
> table.  The database is sufficiently new that no time had been taken
> to vacuum or analyze anything previously.  Neither the development nor
> production system had noteworthy processor load or disk activity
> outside postgres.  On the production system, the above long-running
> query was pegging one of the processor cores at ~100% for almost the
> whole time of the query's execution.
> 
> On the development system, I asked postgres to show me the execution
> plan for my query:
> smirkfp=# explain insert into dupids ( id ) select id from content
> where id not in (select min(id) from content group by hash);
>                                   QUERY PLAN
> --
> ---
>  Seq Scan on content  (cost=204439.55..406047.33 rows=565752 width=4)
>   Filter: (NOT (hashed subplan))
>   SubPlan
>     ->  HashAggregate  (cost=204436.55..204439.05 rows=200 width=36)
>           ->  Seq Scan on content  (cost=0.00..198779.03 rows=1131503
> width=36)
> (5 rows)
> 
> 
> For comparison, I asked the same thing of the production system:
> smirkfp=# explain insert into dupids ( id ) select id from content
> where id not in (select min(id) from content group by hash);
>                                         QUERY PLAN
> --
> ---
>  Seq Scan on content  (cost=493401.85..9980416861.63 rows=760071 width=4)
>   Filter: (NOT (subplan))
>   SubPlan
>     ->  Materialize  (cost=493401.85..504915.85 rows=646400 width=37)
>           ->  GroupAggregate  (cost=468224.39..487705.45 rows=646400 width=37)
>                 ->  Sort  (cost=468224.39..472024.74 rows=1520142 width=37)
>                       Sort Key: public.content.hash
>                       ->  Seq Scan on content  (cost=0.00..187429.42
> rows=1520142 width=37)
> (8 rows)
> 
> 
> Looks pretty different.  Next, I thought I'd try asking the
> development system to analyze the table, content, and see if that
> changed anything:
> smirkfp=# analyze content;
> ANALYZE
> smirkfp=# explain insert into dupids ( id ) select id from content
> where id not in (select min(id) from content group by hash);
>                                         QUERY PLAN
> --
> ---
>  Seq Scan on content  (cost=480291.35..7955136280.55 rows=582888 width=4)
>   Filter: (NOT (subplan))
>   SubPlan
>     ->  Materialize  (cost=480291.35..492297.85 rows=656050 width=40)
>           ->  GroupAggregate  (cost=457245.36..474189.30 rows=656050 width=40)
>                 ->  Sort  (cost=457245.36..460159.80 rows=1165776 width=40)
>                       Sort Key: hash
>                       ->  Seq Scan on content  (cost=0.00..199121.76
> rows=1165776 width=40)
> (8 rows)
> 
> How to approach manipulating the execution plan back to something more
> efficient?  What characteristics of the table could have induced
> analyze to suggest the much slower query plan?
> 

When the table was analyzed, it found many more rows for the hash than the
default assumption (distinct value estimate).  If Postgres thinks the
hash-aggregate plan won't fit in work_mem, it will go to a sort -> group
aggregate plan even if it estimates the sort plan to take thousands of times
more effort.

Solutions:

1.  If this estimate is wrong, try increasing the statistics target on the
column(s) in question and re-analyzing.  Or for a test change the global
default_statistics_target and experiment.  In the explain queries below, it
appears as though this approach may have only a moderate affect.  Is the
estimate of 650,000 distinct values for the hash column accurate?

2.  Increase work_mem so that it can fit the hash in memory and use that
plan and not have to sort the whole table.  The below explain on your
production db thinks it needs to hash into about ~65 distinct buckets
rows of width 37.  That should fit in 32MB RAM or so.
Try work_mem of 16MB, 32MB, and 64MB (and perhaps even 128M or larger on the
test box) and see if the explain changes.

  set work_mem ='32M';
Explain ;

To see what your current work_mem is do
  show work_mem;

If this approach works, you can either set this work_mem before running the
query, or globally change it.  It is set low by default because if all your
connections are busy doing work that requires work_mem, you can end up using
RAM at up to about (2 * work_mem * active connections).

On my larger db, the back-end aggregate processing connections use wo

Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Dimitri
Few weeks ago tested a customer application on 16 cores with Oracle:
  - 20,000 sessions in total
  - 70,000 queries/sec

without any problem on a mid-range Sun box + Solaris 10..

Rgds,
-Dimitri

On 6/3/09, Kevin Grittner  wrote:
> James Mansion  wrote:
>
>> I'm sure most of us evaluating Postgres from a background in Sybase
>> or SQLServer would regard 5000 connections as no big deal.
>
> Sure, but the architecture of those products is based around all the
> work being done by "engines" which try to establish affinity to
> different CPUs, and loop through the various tasks to be done.  You
> don't get a context switch storm because you normally have the number
> of engines set at or below the number of CPUs.  The down side is that
> they spend a lot of time spinning around queue access to see if
> anything has become available to do -- which causes them not to play
> nice with other processes on the same box.
>
> If you do connection pooling and queue requests, you get the best of
> both worlds.  If that could be built into PostgreSQL, it would
> probably reduce the number of posts requesting support for bad
> configurations, and help with benchmarks which don't use proper
> connection pooling for the product; but it would actually not add any
> capability which isn't there if you do your homework
>
> -Kevin
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Kevin Grittner
Dimitri  wrote: 
> Few weeks ago tested a customer application on 16 cores with Oracle:
>   - 20,000 sessions in total
>   - 70,000 queries/sec
> 
> without any problem on a mid-range Sun box + Solaris 10..
 
I'm not sure what point you are trying to make.  Could you elaborate?
 
(If it's that Oracle doesn't need an external connection pool, then
are you advocating that PostgreSQL include that in the base product?)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Scott Carey


On 6/3/09 10:45 AM, "Kevin Grittner"  wrote:

> Dimitri  wrote:
>> Few weeks ago tested a customer application on 16 cores with Oracle:
>>   - 20,000 sessions in total
>>   - 70,000 queries/sec
>> 
>> without any problem on a mid-range Sun box + Solaris 10..
> 
> I'm not sure what point you are trying to make.  Could you elaborate?
> 
> (If it's that Oracle doesn't need an external connection pool, then
> are you advocating that PostgreSQL include that in the base product?)
> 

Here is how I see it -- not speaking for Dimitri.

Although Oracle's connections are heavyweight and expensive to create,
having many of them around and idle does not affect scalability much if at
all.

Postgres could fix its connection scalability issues -- that is entirely
independent of connection pooling.

In most other databases (all others that I have used), pooling merely
prevents the expense of connection creation/destruction and helps save some
RAM and not much else.
The fact that it affects scalability and performance beyond that so
dramatically in Postgres is a problem with Postgres.



> -Kevin
> 
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Dimitri
Just to say you don't need a mega server to keep thousands connections
with Oracle, it's just trivial, nor CPU affinity and other stuff you
may or may not need with Sybase :-)

Regarding PostgreSQL, I think it'll only benefit to have an integrated
connection pooler as it'll make happy all populations anyway:
  - those who don't like the idea may always disable it :-)
  - those who have a lot but mostly inactive sessions will be happy to
simplify session pooling
  - those who really seeking for the most optimal workload on their
servers will be happy twice: if there are any PG scalability limits,
integrated pooler will be in most cases more performant than external;
if there are no PG scalability limits - it'll still help to size PG
most optimally according a HW or OS capacities..

Rgds,
-Dimitri


On 6/3/09, Kevin Grittner  wrote:
> Dimitri  wrote:
>> Few weeks ago tested a customer application on 16 cores with Oracle:
>>   - 20,000 sessions in total
>>   - 70,000 queries/sec
>>
>> without any problem on a mid-range Sun box + Solaris 10..
>
> I'm not sure what point you are trying to make.  Could you elaborate?
>
> (If it's that Oracle doesn't need an external connection pool, then
> are you advocating that PostgreSQL include that in the base product?)
>
> -Kevin
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Robert Haas
On Wed, Jun 3, 2009 at 2:12 PM, Scott Carey  wrote:
> Postgres could fix its connection scalability issues -- that is entirely
> independent of connection pooling.

Really?  I'm surprised.  I thought the two were very closely related.
Could you expand on your thinking here?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-06-03 Thread Erik Aronesty
I think, perhaps, autovac wasn't running on that machine.

Is there any way to check to see if it's running?

I have enabled all the options , and I know it's running on my other
servers because I see

LOG: autovacuum  entries (a profusion of them)

I suspect, perhaps, that it's just not showing up in the log since my
8.2 BSD box came with different settings by default.

current settings:

autovacuum = on
stats_start_collector = on  # needed for block or row stats
stats_row_level = on
log_min_error_statement = error
log_min_messages = notice
log_destination = 'syslog'
client_min_messages = notice

should be enought to get it going and for me to see it right?  not
sure which setting controls logging of autovac, nor am i sure of a way
to *ask* the server if autovac is running.

On Mon, Jun 1, 2009 at 10:06 AM, Tom Lane  wrote:
> Erik Aronesty  writes:
>> but why wasn't autovac enough to reclaim at least *most* of the space?
>
> Autovac isn't meant to reclaim major amounts of bloat; it's more in the
> line of trying to prevent it from happening in the first place.  To
> reclaim bloat it would have to execute VACUUM FULL, or some other
> operation that requires exclusive table lock, which doesn't seem like
> a good idea for an automatic background operation.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-03 Thread Scott Carey


On 6/3/09 11:39 AM, "Robert Haas"  wrote:

> On Wed, Jun 3, 2009 at 2:12 PM, Scott Carey  wrote:
>> Postgres could fix its connection scalability issues -- that is entirely
>> independent of connection pooling.
> 
> Really?  I'm surprised.  I thought the two were very closely related.
> Could you expand on your thinking here?
> 

They are closely related only by coincidence of Postgres' flaws.
If Postgres did not scale so poorly as idle connections increase (or as
active ones increased), they would be rarely needed at all.

Most connection pools in clients (JDBC, ODBC, for example) are designed to
limit the connection create/close count, not the number of idle connections.
They reduce creation/deletion specifically by leaving connections idle for a
while to allow re-use. . .

Other things that can be called "connection concentrators" differ in that
they are additionally trying to put a band-aid over server design flaws that
make idle connections hurt scalability.  Or to prevent resource consumption
issues that the database doesn't have enough control over on its own (again,
a flaw -- a server should be as resilient to bad client behavior and its
resource consumption as possible).


Most 'modern' server designs throttle active actions internally.  Apache's
(very old, and truly somewhat 1995-ish) process or thread per connection
model is being abandoned for event driven models in the next version, so it
can scale like the higher performing web servers to 20K+ keep-alive
connections with significantly fewer threads / processes.

SQL is significantly more complicated than HTTP and requires a lot more
state which dictates a very different design, but nothing about it requires
idle connections to cause reduced SMP scalability.

In addition to making sure idle connections have almost no impact on
performance (just eat up some RAM), scalability as active queries increase
is important.  Although the OS is responsible for a lot of this, there are
many things that the application can do to help out.  If Postgres had a
"max_active_connections" parameter for example, then the memory used by
work_mem would be related to this value and not max_connections.  This would
further make connection poolers/concentrators less useful from a performance
and resource management perspective.

Once the above is done, connection pooling, whether integrated or provided
by a third party, would mostly only have value for clients who cannot pool
or cache connections on their own.  This is the state of connection pooling
with most other DB's today.

> ...Robert
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-06-03 Thread Tom Lane
Erik Aronesty  writes:
> I think, perhaps, autovac wasn't running on that machine.
> Is there any way to check to see if it's running?

> I have enabled all the options , and I know it's running on my other
> servers because I see

> LOG: autovacuum  entries (a profusion of them)

> I suspect, perhaps, that it's just not showing up in the log since my
> 8.2 BSD box came with different settings by default.

8.2 has far crummier support for logging what autovacuum is doing than
8.3 does :-(.  The settings you show should mean that it's running, but
the only way to check specifically is to crank log_min_messages way up,
which will clutter your log with a lot of useless noise along with
autovacuum's messages.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Janine Sisk
I've been Googling for SQL tuning help for Postgres but the pickings  
have been rather slim.  Maybe I'm using the wrong search terms.  I'm  
trying to improve the performance of the following query and would be  
grateful for any hints, either directly on the problem at hand, or to  
resources I can read to find out more about how to do this.  In the  
past I have fixed most problems by adding indexes to get rid of  
sequential scans, but in this case it appears to be the hash join and  
the nested loops that are taking up all the time and I don't really  
know what to do about that.  In Google I found mostly references from  
people wanting to use a hash join to *fix* a performance problem, not  
deal with it creating one...


My Postgres version is 8.3.3, on Linux.

Thanks in advance,

janine

iso=# explain analyze select  a.item_id,
iso-#
iso-#  
content_item__get_best_revision(a.item_id) as revision_id,
iso-#  
content_item__get_latest_revision(a.item_id) as last_revision_id,
iso-#  
content_revision__get_number(a.article_id) as revision_no,
iso-# (select count(*) from cr_revisions  
where item_id=a.item_id) as revision_count,

iso-#
iso-# -- Language support
iso-# b.lang_id,
iso-# b.lang_key,
iso-# (case when b.lang_key = 'big5' then  
'#D7D7D7' else '#ff' end) as tr_bgcolor,
iso-# coalesce(dg21_item_langs__rel_lang  
(b.lang_id,'gb2312'),'0') as gb_item_id,
iso-# coalesce(dg21_item_langs__rel_lang  
(b.lang_id,'iso-8859-1'),'0') as eng_item_id,

iso-#
iso-# -- user defined data
iso-# a.article_id,
iso-# a.region_id,
iso-# a.author,
iso-# a.archive_status,
iso-# a.article_status,
iso-# case when a.archive_status='t'
iso-#  then 'never  
expire'
iso-#  else to_char(a.archive_date,  
'年MM月DD日')

iso-# end as archive_date,
iso-#
iso-# -- Standard data
iso-# a.article_title,
iso-# a.article_desc,
iso-# a.creation_user,
iso-# a.creation_ip,
iso-# a.modifying_user,
iso-#
iso-# -- Pretty format data
iso-# a.item_creator,
iso-#
iso-# -- Other data
iso-# a.live_revision,
iso-# to_char(a.publish_date, '年MM月 
DD日') as publish_date,
iso-# to_char(a.creation_date, 'DD/MM/  
HH:MI AM') as creation_date,

iso-#
iso-# case when article_status='approved'
iso-#  then 'admin content, auto  
approved'

iso-#  when article_status='unapproved'
iso-#  then (select approval_text
iso(#from   dg21_approval
iso(#where   
revision_id=a.article_id
iso(#and 
approval_status='f' order by approval_date desc limit 1)

iso-#  else  ''
iso-# end as approval_text
iso-#
iso-# fromdg21_article_items a,  
dg21_item_langs b

iso-# where   a.item_id = b.item_id
iso-#
iso-# order by b.lang_id desc, a.item_id
iso-# limit 21 offset 0;

   QUERY 
 PLAN

-
 Limit  (cost=3516.97..3516.98 rows=1 width=1245) (actual  
time=195948.132..195948.250 rows=21 loops=1)
   ->  Sort  (cost=3516.97..3516.98 rows=1 width=1245) (actual  
time=195948.122..195948.165 rows=21 loops=1)

 Sort Key: b.lang_id, ci.item_id
 Sort Method:  top-N heapsort  Memory: 24kB
 ->  Nested Loop  (cost=719.67..3516.96 rows=1 width=1245)  
(actual time=346.687..195852.741 rows=4159 loops=1)
   ->  Nested Loop  (cost=719.67..3199.40 rows=1  
width=413) (actual time=311.422..119467.334 rows=4159 loops=1)
 ->  Nested Loop  (cost=719.67..3198.86 rows=1  
width=400) (actual time=292.951..1811.051 rows=4159 loops=1)
   ->  Hash Join  (cost=719.67..3197.98  
ro

Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Tom Lane
Janine Sisk  writes:
> I've been Googling for SQL tuning help for Postgres but the pickings  
> have been rather slim.  Maybe I'm using the wrong search terms.  I'm  
> trying to improve the performance of the following query and would be  
> grateful for any hints, either directly on the problem at hand, or to  
> resources I can read to find out more about how to do this.  In the  
> past I have fixed most problems by adding indexes to get rid of  
> sequential scans, but in this case it appears to be the hash join and  
> the nested loops that are taking up all the time and I don't really  
> know what to do about that.  In Google I found mostly references from  
> people wanting to use a hash join to *fix* a performance problem, not  
> deal with it creating one...

The hashjoin isn't creating any problem that I can see.  What's
hurting you is the nestloops above it, which need to be replaced with
some other join technique.  The planner is going for a nestloop because
it expects only one row out of the hashjoin, which is off by more than
three orders of magnitude :-(.  So in short, your problem is poor
estimation of the selectivity of this condition:

>   Join Filter: ((ci.live_revision =  
> cr.revision_id) OR ((ci.live_revision IS NULL) AND (cr.revision_id =  
> content_item__get_latest_revision(ci.item_id

It's hard to tell why the estimate is so bad, though, since you didn't
provide any additional information.  Perhaps increasing the statistics
target for these columns (or the whole database) would help.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Janine Sisk
Ok, I will look into gathering better statistics.  This is the first  
time I've had a significant problem with a PG database, so this is  
uncharted territory for me.


If there is more info I could give that would help, please be more  
specific about what you need and I will attempt to do so.


Thanks!

janine

On Jun 3, 2009, at 2:42 PM, Tom Lane wrote:


Janine Sisk  writes:

I've been Googling for SQL tuning help for Postgres but the pickings
have been rather slim.  Maybe I'm using the wrong search terms.  I'm
trying to improve the performance of the following query and would be
grateful for any hints, either directly on the problem at hand, or to
resources I can read to find out more about how to do this.  In the
past I have fixed most problems by adding indexes to get rid of
sequential scans, but in this case it appears to be the hash join and
the nested loops that are taking up all the time and I don't really
know what to do about that.  In Google I found mostly references from
people wanting to use a hash join to *fix* a performance problem, not
deal with it creating one...


The hashjoin isn't creating any problem that I can see.  What's
hurting you is the nestloops above it, which need to be replaced with
some other join technique.  The planner is going for a nestloop  
because

it expects only one row out of the hashjoin, which is off by more than
three orders of magnitude :-(.  So in short, your problem is poor
estimation of the selectivity of this condition:


 Join Filter: ((ci.live_revision =
cr.revision_id) OR ((ci.live_revision IS NULL) AND (cr.revision_id =
content_item__get_latest_revision(ci.item_id


It's hard to tell why the estimate is so bad, though, since you didn't
provide any additional information.  Perhaps increasing the statistics
target for these columns (or the whole database) would help.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best way to load test a postgresql server

2009-06-03 Thread Greg Smith

On Tue, 2 Jun 2009, Shaul Dar wrote:

If you want to test the H/W and configuration of your DBMS then you can 
use the pgbench tool (which uses a specific built-in DB+schema, 
following the TPC benchmark).


There are a lot of TPC benchmarks.  pgbench simulates TPC-B (badly), which 
is a benchmark from 1990.  It's not at all representative of the current 
TPC benchmarks.


If you want to load test your own specific DB then I am unaware of any 
such tools.


pgbench will run against any schema and queries, the built-in set are just 
the easiest to use.  I just released a bunch of slides and a package I 
named pgbench-tools that show some of the possibilities here, links to 
everything are at: 
http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html


I'd mentioned working on that this before on this list but the code just 
got stable enough to release recently.  Anybody who is running lots of 
pgbench tests at different database sizes and client loads might benefit 
from using my toolset to automate running the tests and reporting on the 
results.


The last few slides of my pgbench presentation show how you might write a 
custom test that measures how fast rows of various sizes can be inserted 
into your database at various client counts.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Robert Haas
On Wed, Jun 3, 2009 at 6:04 PM, Janine Sisk  wrote:
> Ok, I will look into gathering better statistics.  This is the first time
> I've had a significant problem with a PG database, so this is uncharted
> territory for me.
>
> If there is more info I could give that would help, please be more specific
> about what you need and I will attempt to do so.
>
> Thanks!
>
> janine

You might find it helpful to try to inline the
content_item__get_latest_revision function call.  I'm not sure whether
that's a SQL function or what, but the planner isn't always real
clever about things like that.  If you can redesign things so that all
the logic is in the actual query, you may get better results.

But, we're not always real clever about selectivity.  Sometimes you
have to fake the planner out, as discussed here.

http://archives.postgresql.org/pgsql-performance/2009-06/msg00023.php

Actually, I had to do this today on a production application.  In my
case, the planner thought that a big OR clause was not very selective,
so it figured it wouldn't have to scan very far through the outer side
before it found enough rows to satisfy the LIMIT clause.  Therefore it
materialized the inner side instead of hashing it, and when the
selectivity estimate turned out to be wrong, it took 220 seconds to
execute.  I added a fake join condition of the form a || b = a || b,
where a and b were on different sides of the join, and now it hashes
the inner side and takes < 100 ms.

Fortunately, these kinds of problems are fairly rare, but they can be
extremely frustrating to debug.  With any kind of query debugging, the
first question to ask yourself is "Are any of my selectivity estimates
way off?".  If the answer to that question is no, you should then ask
"Where is all the time going in this plan?".  If the answer to the
first question is yes, though, your time is usually better spent
fixing that problem, because once you do, the plan will most likely
change to something a lot better.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query plan issues - volatile tables

2009-06-03 Thread Brian Herlihy

Hi,

We have a problem with some of our query plans.  One of our tables is quite 
volatile, but postgres always uses the last statistics snapshot from the last 
time it was analyzed for query planning.  Is there a way to tell postgres that 
it should not trust the statistics for this table?  Basically we want it to 
assume that there may be 0, 1 or 100,000 entries coming out from a query on 
that table at any time, and that it should not make any assumptions.

Thanks,
Brian
 
Brian Herlihy
Trellian Pty Ltd
+65 67534396 (Office)
+65 92720492 (Handphone)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Janine Sisk
I'm sorry if this is a stupid question, but...  I changed  
default_statistics_target from the default of 10 to 100, restarted PG,  
and then ran "vacuumdb -z" on the database.  The plan is exactly the  
same as before.  Was I supposed to do something else?  Do I need to  
increase it even further?  This is an overloaded system to start with,  
so I'm being fairly conservative with what I change.


thanks,

janine

On Jun 3, 2009, at 2:42 PM, Tom Lane wrote:


Janine Sisk  writes:

I've been Googling for SQL tuning help for Postgres but the pickings
have been rather slim.  Maybe I'm using the wrong search terms.  I'm
trying to improve the performance of the following query and would be
grateful for any hints, either directly on the problem at hand, or to
resources I can read to find out more about how to do this.  In the
past I have fixed most problems by adding indexes to get rid of
sequential scans, but in this case it appears to be the hash join and
the nested loops that are taking up all the time and I don't really
know what to do about that.  In Google I found mostly references from
people wanting to use a hash join to *fix* a performance problem, not
deal with it creating one...


The hashjoin isn't creating any problem that I can see.  What's
hurting you is the nestloops above it, which need to be replaced with
some other join technique.  The planner is going for a nestloop  
because

it expects only one row out of the hashjoin, which is off by more than
three orders of magnitude :-(.  So in short, your problem is poor
estimation of the selectivity of this condition:


 Join Filter: ((ci.live_revision =
cr.revision_id) OR ((ci.live_revision IS NULL) AND (cr.revision_id =
content_item__get_latest_revision(ci.item_id


It's hard to tell why the estimate is so bad, though, since you didn't
provide any additional information.  Perhaps increasing the statistics
target for these columns (or the whole database) would help.

regards, tom lane

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org 
)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


---
Janine Sisk
President/CEO of furfly, LLC
503-693-6407





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] degenerate performance on one server of 3

2009-06-03 Thread Reid Thompson

Erik Aronesty wrote:

I think, perhaps, autovac wasn't running on that machine.

Is there any way to check to see if it's running?



since it looks like stats are on too

http://www.network-theory.co.uk/docs/postgresql/vol3/ViewingCollectedStatistics.html

read the entry on pg_stat_all_tables

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pointers needed on optimizing slow SQL statements

2009-06-03 Thread Scott Marlowe
On Wed, Jun 3, 2009 at 8:32 PM, Janine Sisk  wrote:
> I'm sorry if this is a stupid question, but...  I changed
> default_statistics_target from the default of 10 to 100, restarted PG, and
> then ran "vacuumdb -z" on the database.  The plan is exactly the same as
> before.  Was I supposed to do something else?  Do I need to increase it even
> further?  This is an overloaded system to start with, so I'm being fairly
> conservative with what I change.

No need to restart pg, just analyze is good enough (vacuumdb -z will do).

After that, compare your explain analyze output and see if the
estimates are any better.  If they're better but not good enough, try
increasing stats target to something like 500 or 1000 (max is 1000)
and reanalyze and see if that helps.  If not, post the new explain
analyze and we'll take another whack at it.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance