Re: [PERFORM] CLUSTER and a problem

2009-09-15 Thread Andrzej Zawadzki
Tom Lane wrote:
> Andrzej Zawadzki  writes:
>   
>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag
>> WHERE TRUE
>> AND kredytyag.id = 3064776
>> AND NOT EXISTS
>> (SELECT 1 FROM
>> ( SELECT * FROM kredyty kr
>> where telekredytid = 328650
>> ORDER BY kr.datazaw DESC LIMIT 1 )
>> kred where kred.bank = 2);
>> 
>
> So this is the slow bit:
>
>   
>>  ->  Subquery Scan kred  (cost=0.00..778.06 rows=1 width=0) (actual
>> time=2045556.496..2045556.496 rows=0 loops=1)
>>Filter: (kred.bank = 2)
>>->  Limit  (cost=0.00..778.05 rows=1 width=3873) (actual
>> time=2045556.492..2045556.492 rows=0 loops=1)
>>  ->  Index Scan Backward using kredyty_datazaw on
>> kredyty kr  (cost=0.00..1088490.39 rows=1399 width=3873) (actual
>> time=2045556.487..2045556.487 rows=0 loops=1)
>>Filter: (telekredytid = 328650)
>> 
>
> It's doing a scan in descending datazaw order and hoping to find a row
> that has both telekredytid = 328650 and bank = 2.  Evidently there isn't
> one, so the indexscan runs clear to the end before it can report that the
> NOT EXISTS is true.  Unfortunately, you've more or less forced this
> inefficient query plan by wrapping some of the search conditions inside a
> LIMIT and some outside.  Try phrasing the NOT EXISTS query differently.
> Or, if you do this type of query a lot, a special-purpose index might be
> worthwhile.  It would probably be fast as-is if you had an index on
> (telekredytid, datazaw) (in that order).
>   
That's no problem - we already has changed this query:
SELECT * FROM kredyty kr
where kr.telekredytid = 328652
and kr.bank = 2
AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2
and k2.creationdate > kr.creationdate)
Works good.

But in fact this wasn't my point.
My point was: why operation CLUSTER has such a big and bad impact on
planer for this query?
Like I sad: before CLUSTER query was run in xx milliseconds :-)

-- 
Andrzej Zawadzki

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] disable heavily updated (but small) table auto-vecuuming

2009-09-15 Thread Ludwik Dylag
Hello
I have a database where I daily create a table.
Every day it is being inserted with ~3mln rows and each of them is being
updated two times.The process lasts ~24 hours so the db load is the same at
all the time. total size of the table is ~3GB.

My current vacuum settings are:
autovacuum = on
autovacuum_max_workers = 3
autovacuum_freeze_max_age = 20 (changed from 2)
vacuum_freeze_min_age = 1

I have over 250 mln of frozen ids.
# SELECT datname, age(datfrozenxid) FROM pg_database;
  datname   |age
+---
my_database | 256938425

and every day (since max age exceeded 200mln.) the current table is being
vacuumed two hours after it was created.

My goal is to set the vacuum properties so the current table is not vacuumed
when it is used. And to vacuum it manually one day after it was used.

Is it enough to set
autovacuum=off
autovacuum_freeze_max_age=20
vacuum_freeze_min_age = 1
and shedule in cron daily vacuum on selected table?


Thanks in advance for your help.

-- 
Ludwik Dyląg


Re: [PERFORM] possible wrong query plan on pg 8.3.5,

2009-09-15 Thread zz_11

Цитат от Віталій Тимчишин :


May be you have very bad disk access times (e.g. slow random access)? In
this case everything should be OK while data in cache and awful, when not.
Could you check disk IO speed && IO wait while doing slow & fast query.



No, I think all is ok with disks. On my test server I have 8 SATA in  
RAID 10 and on my production server I have 16 SATA in RAID10 dedicated  
for pg data and also 8 SATA in RAID 10 for OS and pg_x_log and I do  
not have any IO wait.

It is true, disks are much slower compared to RAM.


BTW: In this case, increasing shared buffers may help. At least this will
prevent other applications & AFAIK sequence scans to move your index data
from cache.


I will try to increase this value.
I think recomendation in docs was 1/4 from RAM, and on production  
server I have it setup to 1/4 from RAM ( 32 GB).


Will os not cache the data from shared buffers for second time ?

The next step will be to move to pg 8.4, but I i twill tak etime for testing.



Best regards, Vitalii Tymchyshyn



regards,
ivan.

-
 http://www.tooway.bg/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CLUSTER and a problem

2009-09-15 Thread Andrzej Zawadzki
Andrzej Zawadzki wrote:
> Tom Lane wrote:
>   
>> Andrzej Zawadzki  writes:
>>   
>> 
>>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag
>>> WHERE TRUE
>>> AND kredytyag.id = 3064776
>>> AND NOT EXISTS
>>> (SELECT 1 FROM
>>> ( SELECT * FROM kredyty kr
>>> where telekredytid = 328650
>>> ORDER BY kr.datazaw DESC LIMIT 1 )
>>> kred where kred.bank = 2);
>>> 
>>>   
>> So this is the slow bit:
>>
>>   
>> 
>>>  ->  Subquery Scan kred  (cost=0.00..778.06 rows=1 width=0) (actual
>>> time=2045556.496..2045556.496 rows=0 loops=1)
>>>Filter: (kred.bank = 2)
>>>->  Limit  (cost=0.00..778.05 rows=1 width=3873) (actual
>>> time=2045556.492..2045556.492 rows=0 loops=1)
>>>  ->  Index Scan Backward using kredyty_datazaw on
>>> kredyty kr  (cost=0.00..1088490.39 rows=1399 width=3873) (actual
>>> time=2045556.487..2045556.487 rows=0 loops=1)
>>>Filter: (telekredytid = 328650)
>>> 
>>>   
>> It's doing a scan in descending datazaw order and hoping to find a row
>> that has both telekredytid = 328650 and bank = 2.  Evidently there isn't
>> one, so the indexscan runs clear to the end before it can report that the
>> NOT EXISTS is true.  Unfortunately, you've more or less forced this
>> inefficient query plan by wrapping some of the search conditions inside a
>> LIMIT and some outside.  Try phrasing the NOT EXISTS query differently.
>> Or, if you do this type of query a lot, a special-purpose index might be
>> worthwhile.  It would probably be fast as-is if you had an index on
>> (telekredytid, datazaw) (in that order).
>>   
>> 
> That's no problem - we already has changed this query:
> SELECT * FROM kredyty kr
> where kr.telekredytid = 328652
> and kr.bank = 2
> AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2
> and k2.creationdate > kr.creationdate)
> Works good.
>
> But in fact this wasn't my point.
> My point was: why operation CLUSTER has such a big and bad impact on
> planer for this query?
> Like I sad: before CLUSTER query was run in xx milliseconds :-)
>
>   
Before CLUSTER was:

# EXPLAIN ANALYZE SELECT telekredytid FROM kredytyag
WHERE TRUE
AND kredytyag.id = 3064776
AND NOT EXISTS
(
SELECT 1 FROM
(
SELECT * FROM kredyty kr
where telekredytid = 328652
ORDER BY kr.datazaw DESC LIMIT 1
)
kred where kred.bank = 2)
;
  
QUERY
PLAN


 Result  (cost=1317.25..1325.55 rows=1 width=4) (actual
time=0.235..0.235 rows=0 loops=1)
   One-Time Filter: (NOT $0)
   InitPlan
 ->  Subquery Scan kred  (cost=1317.24..1317.25 rows=1 width=0)
(actual time=0.188..0.188 rows=0 loops=1)
   Filter: (kred.bank = 2)
   ->  Limit  (cost=1317.24..1317.24 rows=1 width=4006) (actual
time=0.172..0.172 rows=0 loops=1)
 ->  Sort  (cost=1317.24..1320.27 rows=1212 width=4006)
(actual time=0.069..0.069 rows=0 loops=1)
   Sort Key: kr.datazaw
   Sort Method:  quicksort  Memory: 25kB
   ->  Index Scan using kredyty_telekredytid_idx on
kredyty kr  (cost=0.00..1311.18 rows=1212 width=4006) (actual
time=0.029..0.029 rows=0 loops=1)
 Index Cond: (telekredytid = 328652)
   ->  Index Scan using kredytyag_pkey on kredytyag  (cost=0.00..8.29
rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=1)
 Index Cond: (id = 3064776)
 Total runtime: 1.026 ms
(14 rows)

and that's clear for me.
Probably bad index for CLUSTER - Investigating ;-)

-- 
Andrzej Zawadzki

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Problem with partitionning and orderby query plans

2009-09-15 Thread Gaël Le Mignot

Hello,

In the same context that my  previous thread on this mailing list (the
database holding  500k articles of  a french daily newspaper),  we now
need to handle the users' comments on the articles (1 million for now,
quickly growing).

In our context, we'll have three kind of queries :

- queries on articles only ;

- queries on comments only ;

- queries on both articles and comments.

We tried to use the partitionning feature described at
http://www.postgresql.org/docs/8.4/static/ddl-partitioning.html , with three
tables :

- libeindex (master table, no data)

- libearticle (articles)

- libecontribution (comments)

The schema looks like :

CREATE TABLE libeindex (

  id integer,
  classname varchar(255),
  createdAt timestamp,
  modifiedAt timestamp,
...
  PRIMARY KEY (classname, id)
);


CREATE TABLE libecontribution (
  CHECK (classname = 'contribution'), 
  PRIMARY KEY (classname, id)
) INHERITS (libeindex) ;

CREATE TABLE libearticle (
  CHECK (classname = 'article'), 
  PRIMARY KEY (classname, id)
) INHERITS (libeindex) ;

With many indexes are created on the two subtables, including :
CREATE INDEX libearticle_createdAt_index ON libearticle (createdAt);
CREATE INDEX libearticle_class_createdAt_index ON libearticle (classname, 
createdAt);

The problem we  have is that with the  partionned table, PostgreSQL is
now unable  to use the "index  scan backwards" query plan  on a simple
"order by limit" query.

For example :

libepart=> explain analyze SELECT classname, id FROM libeindex WHERE (classname 
IN ('article')) ORDER BY createdAt DESC LIMIT 50;
   QUERY 
PLAN
-
 Limit  (cost=114980.14..114980.27 rows=50 width=20) (actual 
time=4070.953..4071.076 rows=50 loops=1)
   ->  Sort  (cost=114980.14..116427.34 rows=578878 width=20) (actual 
time=4070.949..4070.991 rows=50 loops=1)
 Sort Key: public.libeindex.createdat
 Sort Method:  top-N heapsort  Memory: 28kB
 ->  Result  (cost=0.00..95750.23 rows=578878 width=20) (actual 
time=0.068..3345.727 rows=578877 loops=1)
   ->  Append  (cost=0.00..95750.23 rows=578878 width=20) (actual 
time=0.066..2338.575 rows=578877 loops=1)
 ->  Index Scan using libeindex_pkey on libeindex  
(cost=0.00..8.27 rows=1 width=528) (actual time=0.011..0.011 rows=0 loops=1)
   Index Cond: ((classname)::text = 'article'::text)
 ->  Seq Scan on libearticle libeindex  
(cost=0.00..95741.96 rows=578877 width=20) (actual time=0.051..1364.296 
rows=578877 loops=1)
   Filter: ((classname)::text = 'article'::text)
 Total runtime: 4071.195 ms
(11 rows)

libepart=> explain analyze SELECT classname, id FROM libearticle WHERE 
(classname IN ('article')) ORDER BY createdAt DESC LIMIT 50;
 
QUERY PLAN  
   

 Limit  (cost=0.00..9.07 rows=50 width=20) (actual time=0.033..0.200 rows=50 
loops=1)
   ->  Index Scan Backward using libearticle_createdat_index on libearticle  
(cost=0.00..105053.89 rows=578877 width=20) (actual time=0.030..0.112 rows=50 
loops=1)
 Filter: ((classname)::text = 'article'::text)
 Total runtime: 0.280 ms
(4 rows)

As you can see, PostgreSQL  doesn't realize that the table "libeindex"
is in  fact empty, and  that it only  needs to query the  subtable, on
which it can use the "Index Scan Backward" query plan.

Is this a known limitation of the partionning method ? If so, it could
be interesting to mention it on  the documentation. If not, is there a
way to work around the problem ?

Regards,

-- 
Gaël Le Mignot - g...@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to post Performance Questions

2009-09-15 Thread Kevin Grittner
Michael Glaesemann  wrote:
> On Sep 14, 2009, at 16:55 , Josh Berkus wrote:
 
>> Please read the following two documents before posting your  
>> performance query here:
>>
>> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>> http://wiki.postgresql.org/wiki/SlowQueryQuestions
>>
>> This will help other users to troubleshoot your problems far
>> more rapidly.
> 
> Can something similar be added to the footer of (at least) the  
> performance list?
 
Perhaps on this page?:
 
http://www.postgresql.org/community/lists/
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] View vs Stored Proc Performance

2009-09-15 Thread Ivan Voras

Merlin Moncure wrote:

On Sat, Sep 12, 2009 at 7:51 AM, Dimitri Fontaine
 wrote:

Merlin Moncure  writes:

like joining the result to another table...the planner can see
'through' the view, etc.  in a function, the result is fetched first
and materialized without looking at the rest of the query.

I though the planner would "see through" SQL language functions and
inline them when possible, so they often can make for parametrized
views...


It can happen for simple functions but often it will not.  For views
it always happens.


Are functions in language 'sql' handled differently than those of 
language 'plpgsql'?


I think they're not so in any case a function will behave as a black box 
with regards to the planner and optimizer (and views are always 
'transparent').



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] View vs Stored Proc Performance

2009-09-15 Thread Tom Lane
Ivan Voras  writes:
> Are functions in language 'sql' handled differently than those of 
> language 'plpgsql'?

Yes.

> I think they're not so in any case a function will behave as a black box 
> with regards to the planner and optimizer (and views are always 
> 'transparent').

No.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] disable heavily updated (but small) table auto-vecuuming

2009-09-15 Thread Robert Haas
2009/9/15 Ludwik Dylag :
> Hello
> I have a database where I daily create a table.
> Every day it is being inserted with ~3mln rows and each of them is being
> updated two times.The process lasts ~24 hours so the db load is the same at
> all the time. total size of the table is ~3GB.
> My current vacuum settings are:
> autovacuum = on
> autovacuum_max_workers = 3
> autovacuum_freeze_max_age = 20 (changed from 2)
> vacuum_freeze_min_age = 1
> I have over 250 mln of frozen ids.
> # SELECT datname, age(datfrozenxid) FROM pg_database;
>   datname   |    age
> +---
> my_database | 256938425
> and every day (since max age exceeded 200mln.) the current table is being
> vacuumed two hours after it was created.
> My goal is to set the vacuum properties so the current table is not vacuumed
> when it is used. And to vacuum it manually one day after it was used.
> Is it enough to set
> autovacuum=off
> autovacuum_freeze_max_age=20
> vacuum_freeze_min_age = 1
> and shedule in cron daily vacuum on selected table?

How about just disabling autovacuum for that table?

http://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] View vs Stored Proc Performance

2009-09-15 Thread Ivan Voras
2009/9/15 Tom Lane :
> Ivan Voras  writes:
>> Are functions in language 'sql' handled differently than those of
>> language 'plpgsql'?
>
> Yes.
>
>> I think they're not so in any case a function will behave as a black box
>> with regards to the planner and optimizer (and views are always
>> 'transparent').
>
> No.

Thanks! This is interesting information!

-- 
f+rEnSIBITAhITAhLR1nM9F4cIs5KJrhbcsVtUIt7K1MhWJy1A==

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] statement stats extra load?

2009-09-15 Thread Alan McKay
Is there a rule of thumb for the extra load that will be put on a
system when statement stats are turned on?

And if so, where does that extra load go?Disk?  CPU?  RAM?

-- 
“Don't eat anything you've ever seen advertised on TV”
 - Michael Pollan, author of "In Defense of Food"

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CLUSTER and a problem

2009-09-15 Thread Andrzej Zawadzki
Andrzej Zawadzki wrote:
> Tom Lane wrote:
>   
>> Andrzej Zawadzki  writes:
>>   
>> 
>>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag
>>> WHERE TRUE
>>> AND kredytyag.id = 3064776
>>> AND NOT EXISTS
>>> (SELECT 1 FROM
>>> ( SELECT * FROM kredyty kr
>>> where telekredytid = 328650
>>> ORDER BY kr.datazaw DESC LIMIT 1 )
>>> kred where kred.bank = 2);
>>> 
>>>   
>> So this is the slow bit:
>>
>>   
>> 
>>>  ->  Subquery Scan kred  (cost=0.00..778.06 rows=1 width=0) (actual
>>> time=2045556.496..2045556.496 rows=0 loops=1)
>>>Filter: (kred.bank = 2)
>>>->  Limit  (cost=0.00..778.05 rows=1 width=3873) (actual
>>> time=2045556.492..2045556.492 rows=0 loops=1)
>>>  ->  Index Scan Backward using kredyty_datazaw on
>>> kredyty kr  (cost=0.00..1088490.39 rows=1399 width=3873) (actual
>>> time=2045556.487..2045556.487 rows=0 loops=1)
>>>Filter: (telekredytid = 328650)
>>> 
>>>   
>> It's doing a scan in descending datazaw order and hoping to find a row
>> that has both telekredytid = 328650 and bank = 2.  Evidently there isn't
>> one, so the indexscan runs clear to the end before it can report that the
>> NOT EXISTS is true.  Unfortunately, you've more or less forced this
>> inefficient query plan by wrapping some of the search conditions inside a
>> LIMIT and some outside.  Try phrasing the NOT EXISTS query differently.
>> Or, if you do this type of query a lot, a special-purpose index might be
>> worthwhile.  It would probably be fast as-is if you had an index on
>> (telekredytid, datazaw) (in that order).
>>   
>> 
> That's no problem - we already has changed this query:
> SELECT * FROM kredyty kr
> where kr.telekredytid = 328652
> and kr.bank = 2
> AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2
> and k2.creationdate > kr.creationdate)
> Works good.
>
> But in fact this wasn't my point.
> My point was: why operation CLUSTER has such a big and bad impact on
> planer for this query?
> Like I sad: before CLUSTER query was run in xx milliseconds :-)
>   
OK I've got it :-)
I've prepared test database (on fast disks - CLUSTER took 2h anyway ;-)

Step 1:
qstest=# CREATE UNIQUE INDEX kredyty_desc_pkey ON kredyty using btree
(id desc);
CREATE
INDEX   
Step 2:
qstest=# CLUSTER kredyty USING kredyty_desc_pkey;
CLUSTER 
Step 3:
qstest=# ANALYZE kredyty;
ANALYZE 
Step 4:
qstest=# EXPLAIN ANALYZE SELECT telekredytid FROM kredytyag
WHERE TRUE 
AND kredytyag.id = 3064776 
AND NOT EXISTS 
(  
SELECT 1 FROM  
(  
SELECT * FROM kredyty kr   
where telekredytid = 328652
ORDER BY kr.datazaw DESC LIMIT 1   
)  
kred where kred.bank = 2)  
;  
  
QUERY
PLAN


  

-


  

 Result  (cost=833.09..841.38 rows=1 width=4) (actual
time=70.050..70.050 rows=0
loops=1)


   One-Time Filter: (NOT
$0) 


  
InitPlan

  

 ->  Subquery Scan kred  (cost=833.07..833.09 rows=1 width=0)
(actual time=48.223..48.223 rows=0
loops=1)
   Filter: (kred.bank =
2)  
 

   ->  Limit  (cost=833.07..833.08 rows=1 width=3975) (actual
time=48.206..48.206 rows=0
loops=1)