Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-28 Thread Faludi Gábor
Hi,

here is what the original query was which was obviously nonsense :
EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, s.elofordulas FROM letoltes
INNER JOIN (select letoltes.cid, count(letoltes.cid) AS elofordulas FROM
letoltes GROUP BY cid) s ON s.cid=letoltes.cid ORDER BY s.elofordulas DESC
LIMIT 5;
 
QUERY PLAN



 Limit  (cost=73945.35..73945.65 rows=5 width=12) (actual
time=4191.396..4351.966 rows=5 loops=1)
   ->  Unique  (cost=73945.35..77427.99 rows=58800 width=12) (actual
time=4191.390..4351.956 rows=5 loops=1)
 ->  Sort  (cost=73945.35..75106.23 rows=464351 width=12) (actual
time=4191.386..4283.545 rows=175944 loops=1)
   Sort Key: s.elofordulas, letoltes.cid
   ->  Merge Join  (cost=9257.99..30238.65 rows=464351 width=12)
(actual time=652.535..2920.304 rows=464351 loops=1)
 Merge Cond: ("outer".cid = "inner".cid)
 ->  Index Scan using idx_letoltes_cid on letoltes
(cost=0.00..12854.51 rows=464351 width=4) (actual time=0.084..1270.588
rows=464351 loops=1)
 ->  Sort  (cost=9257.99..9258.73 rows=294 width=12)
(actual time=652.434..810.941 rows=464176 loops=1)
   Sort Key: s.cid
   ->  Subquery Scan s  (cost=9242.26..9245.94
rows=294 width=12) (actual time=651.343..652.028 rows=373 loops=1)
 ->  HashAggregate  (cost=9242.26..9243.00
rows=294 width=4) (actual time=651.339..651.661 rows=373 loops=1)
   ->  Seq Scan on letoltes
(cost=0.00..6920.51 rows=464351 width=4) (actual time=0.014..307.469
rows=464351 loops=1)
 Total runtime: 4708.434 ms
(13 sor)

However after fixing the query this is 1/4 th of the time  but still blocks
the site :


EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid)  AS
elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
   QUERY PLAN

-
 Limit  (cost=9255.05..9255.09 rows=5 width=4) (actual time=604.734..604.743
rows=5 loops=1)
   ->  Unique  (cost=9255.05..9257.26 rows=294 width=4) (actual
time=604.732..604.737 rows=5 loops=1)
 ->  Sort  (cost=9255.05..9255.79 rows=294 width=4) (actual
time=604.730..604.732 rows=5 loops=1)
   Sort Key: count(cid), cid
   ->  HashAggregate  (cost=9242.26..9243.00 rows=294 width=4)
(actual time=604.109..604.417 rows=373 loops=1)
 ->  Seq Scan on letoltes  (cost=0.00..6920.51
rows=464351 width=4) (actual time=0.022..281.413 rows=464351 loops=1)
 Total runtime: 604.811 ms


here is the table : 
\d letoltes
 TĂĄbla "public.letoltes"
 Oszlop | TĂ­pus  |  MĂłdosĂ­tĂł
+-+
 id | integer | not null default nextval('letoltes_seq'::text)
 cid| integer |
Indexes:
"idx_letoltes_cid" btree (cid)
"idx_letoltes_id" btree (id)

select count(1) from letoltes;
 count

 464351


VACUM ANALYZE runs overnight every day.

thanks,
Gabor

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Claus Guttesen
Sent: Monday, July 28, 2008 8:56 AM
To: Faludi Gábor
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] how does pg handle concurrent queries and same
queries

> I have taken over the maintenance of a server farm , recently. 2 webserver
> on  db server. They are quite powerful 2 processor xeon w/ 6Gig of ram .
>
> Couple of days ago we had a serious performance hit and the db server (pg.
> v7.4) was overloaded w/ something in a way that operating system was
almost
> not able to respond or in cases it did not.
>
> After some analysis i suspect that there is a query that takes up to 1
> second and that is the cause. Upon  each page loading this query fires and
> takes the one second and blocks the page to load completly . The load was
> roughly ~300 connections in one minute .
>
> So here are my questions :
>
> . Why does the second and the later queries take the whole on
second
> if the dataset is the same . Shouldn't PG realise that the query is the
same
> so i give the user the same resultset ?
>
> . How do I know if one query blocks the other ?
>
> . Is there a way to log the long running queries in 7.4 ? If not
is
> it available in any newer version ?

Can you post the queries? Can you provide an 'analyze explain'? Do you
perform a 'vacuum analyze' on a regular basis?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

-- 
Sent via pgsql-performance mailing lis

Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-28 Thread Craig Ringer
Faludi Gábor wrote:

> . Why does the second and the later queries take the whole on second
> if the dataset is the same . Shouldn't PG realise that the query is the same
> so i give the user the same resultset ?

That would require a result cache. I don't know if Pg even has a query
result cache - I don't think so, but I'm not sure. Even if it does, it'd
still only be useful if the queries were issued under *exactly* the same
conditions - in other words, no writes had been made to the database
since the cached query was issued, and the first query had committed
before the second began (or was read-only). Additionally, no volatile
functions could be called in the query, because their values/effects
might be different when the query is executed a second time. That
includes triggers, etc.

Since 7.4 doesn't do lazy xid allocation it can't really tell that
nothing has been changed since the previous query was cached. So, if I'm
not missing something here, a query result cache would be useless anyway.

> . How do I know if one query blocks the other ?

Examination of pg_catalog.pg_locks is certainly a start. It's trickier
with lots of short-running queries, though.

> . Is there a way to log the long running queries in 7.4 ? If not is
> it available in any newer version ?

It's certainly available in 8.3, as log_min_duration_statement in
postgresql.conf . You can find out if it's in 7.4, and if not what
version it was introduced in, by looking through the documentation for
versions 7.4 and up.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-28 Thread Matthew Wakeling

On Mon, 28 Jul 2008, Faludi Gábor wrote:

EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid)  AS
elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
  QUERY PLAN
-
Limit  (cost=9255.05..9255.09 rows=5 width=4) (actual  time=604.734..604.743 > 
rows=5 loops=1)
  ->  Unique  (cost=9255.05..9257.26 rows=294 width=4) (actual 
time=604.732..604.737 rows=5 loops=1)
->  Sort  (cost=9255.05..9255.79 rows=294 width=4) (actual 
time=604.730..604.732 rows=5 loops=1)
  Sort Key: count(cid), cid
  ->  HashAggregate  (cost=9242.26..9243.00 rows=294 width=4) 
(actual time=604.109..604.417 rows=373 loops=1)
->  Seq Scan on letoltes  (cost=0.00..6920.51 rows=464351 
width=4) (actual time=0.022..281.413 rows=464351 loops=1)
Total runtime: 604.811 ms


So this query is doing a sequential scan of the letoltes table for each 
query. You may get some improvement by creating an index on cid and 
clustering on that index, but probably not much.


Moving to Postgres 8.3 will probably help a lot, as it will allow multiple 
queries to use the same sequential scan in parallel. That's assuming the 
entire table isn't in cache.


Another solution would be to create an additional table that contains the 
results of this query, and keep it up to date using triggers on the 
original table. Then query that table instead.


However, probably the best solution is to examine the problem and work out 
if you can alter the application to make it avoid doing such an expensive 
query so often. Perhaps it could cache the results.


Matthew

--
Psychotics are consistently inconsistent. The essence of sanity is to
be inconsistently inconsistent.
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-28 Thread Alvaro Herrera
Craig Ringer wrote:
> Faludi Gábor wrote:
> 
> > . Why does the second and the later queries take the whole on second
> > if the dataset is the same . Shouldn't PG realise that the query is the same
> > so i give the user the same resultset ?
> 
> That would require a result cache. I don't know if Pg even has a query
> result cache - I don't think so, but I'm not sure.

It doesn't.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] A guide/tutorial to performance monitoring and tuning

2008-07-28 Thread Mark Wong
On Mon, Jul 21, 2008 at 10:24 PM, Greg Smith <[EMAIL PROTECTED]> wrote:
> On Mon, 21 Jul 2008, Francisco Reyes wrote:
>
>> On 2:59 pm 06/29/08 Greg Smith <[EMAIL PROTECTED]> wrote:
>>>
>>> Right now I'm working with a few other people to put together a more
>>> straightforward single intro guide that should address some of the
>>> vagueness you point out here,
>>
>> Was that ever completed?
>
> Not done yet; we're planning to have a first rev done in another couple of
> weeks.  The work in progress is at
> http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server and I'm due to
> work out another set of improvements to that this week during OSCON.

I'd also like to point out we're putting together some data revolving
about software raid, hardware raid, volume management, and filesystem
performance on a system donated by HP here:

http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

Note that it's also a living guide and we've haven't started covering
some of the things I just mentioned.

Regards,
Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance