Re: [PERFORM] CLUSTER and a problem
Tom Lane wrote: > Andrzej Zawadzki writes: > >> # EXPLAIN ANALYZE SElect telekredytid from kredytyag >> WHERE TRUE >> AND kredytyag.id = 3064776 >> AND NOT EXISTS >> (SELECT 1 FROM >> ( SELECT * FROM kredyty kr >> where telekredytid = 328650 >> ORDER BY kr.datazaw DESC LIMIT 1 ) >> kred where kred.bank = 2); >> > > So this is the slow bit: > > >> -> Subquery Scan kred (cost=0.00..778.06 rows=1 width=0) (actual >> time=2045556.496..2045556.496 rows=0 loops=1) >>Filter: (kred.bank = 2) >>-> Limit (cost=0.00..778.05 rows=1 width=3873) (actual >> time=2045556.492..2045556.492 rows=0 loops=1) >> -> Index Scan Backward using kredyty_datazaw on >> kredyty kr (cost=0.00..1088490.39 rows=1399 width=3873) (actual >> time=2045556.487..2045556.487 rows=0 loops=1) >>Filter: (telekredytid = 328650) >> > > It's doing a scan in descending datazaw order and hoping to find a row > that has both telekredytid = 328650 and bank = 2. Evidently there isn't > one, so the indexscan runs clear to the end before it can report that the > NOT EXISTS is true. Unfortunately, you've more or less forced this > inefficient query plan by wrapping some of the search conditions inside a > LIMIT and some outside. Try phrasing the NOT EXISTS query differently. > Or, if you do this type of query a lot, a special-purpose index might be > worthwhile. It would probably be fast as-is if you had an index on > (telekredytid, datazaw) (in that order). > That's no problem - we already has changed this query: SELECT * FROM kredyty kr where kr.telekredytid = 328652 and kr.bank = 2 AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2 and k2.creationdate > kr.creationdate) Works good. But in fact this wasn't my point. My point was: why operation CLUSTER has such a big and bad impact on planer for this query? Like I sad: before CLUSTER query was run in xx milliseconds :-) -- Andrzej Zawadzki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] disable heavily updated (but small) table auto-vecuuming
Hello I have a database where I daily create a table. Every day it is being inserted with ~3mln rows and each of them is being updated two times.The process lasts ~24 hours so the db load is the same at all the time. total size of the table is ~3GB. My current vacuum settings are: autovacuum = on autovacuum_max_workers = 3 autovacuum_freeze_max_age = 20 (changed from 2) vacuum_freeze_min_age = 1 I have over 250 mln of frozen ids. # SELECT datname, age(datfrozenxid) FROM pg_database; datname |age +--- my_database | 256938425 and every day (since max age exceeded 200mln.) the current table is being vacuumed two hours after it was created. My goal is to set the vacuum properties so the current table is not vacuumed when it is used. And to vacuum it manually one day after it was used. Is it enough to set autovacuum=off autovacuum_freeze_max_age=20 vacuum_freeze_min_age = 1 and shedule in cron daily vacuum on selected table? Thanks in advance for your help. -- Ludwik Dyląg
Re: [PERFORM] possible wrong query plan on pg 8.3.5,
Цитат от Віталій Тимчишин : May be you have very bad disk access times (e.g. slow random access)? In this case everything should be OK while data in cache and awful, when not. Could you check disk IO speed && IO wait while doing slow & fast query. No, I think all is ok with disks. On my test server I have 8 SATA in RAID 10 and on my production server I have 16 SATA in RAID10 dedicated for pg data and also 8 SATA in RAID 10 for OS and pg_x_log and I do not have any IO wait. It is true, disks are much slower compared to RAM. BTW: In this case, increasing shared buffers may help. At least this will prevent other applications & AFAIK sequence scans to move your index data from cache. I will try to increase this value. I think recomendation in docs was 1/4 from RAM, and on production server I have it setup to 1/4 from RAM ( 32 GB). Will os not cache the data from shared buffers for second time ? The next step will be to move to pg 8.4, but I i twill tak etime for testing. Best regards, Vitalii Tymchyshyn regards, ivan. - http://www.tooway.bg/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] CLUSTER and a problem
Andrzej Zawadzki wrote: > Tom Lane wrote: > >> Andrzej Zawadzki writes: >> >> >>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag >>> WHERE TRUE >>> AND kredytyag.id = 3064776 >>> AND NOT EXISTS >>> (SELECT 1 FROM >>> ( SELECT * FROM kredyty kr >>> where telekredytid = 328650 >>> ORDER BY kr.datazaw DESC LIMIT 1 ) >>> kred where kred.bank = 2); >>> >>> >> So this is the slow bit: >> >> >> >>> -> Subquery Scan kred (cost=0.00..778.06 rows=1 width=0) (actual >>> time=2045556.496..2045556.496 rows=0 loops=1) >>>Filter: (kred.bank = 2) >>>-> Limit (cost=0.00..778.05 rows=1 width=3873) (actual >>> time=2045556.492..2045556.492 rows=0 loops=1) >>> -> Index Scan Backward using kredyty_datazaw on >>> kredyty kr (cost=0.00..1088490.39 rows=1399 width=3873) (actual >>> time=2045556.487..2045556.487 rows=0 loops=1) >>>Filter: (telekredytid = 328650) >>> >>> >> It's doing a scan in descending datazaw order and hoping to find a row >> that has both telekredytid = 328650 and bank = 2. Evidently there isn't >> one, so the indexscan runs clear to the end before it can report that the >> NOT EXISTS is true. Unfortunately, you've more or less forced this >> inefficient query plan by wrapping some of the search conditions inside a >> LIMIT and some outside. Try phrasing the NOT EXISTS query differently. >> Or, if you do this type of query a lot, a special-purpose index might be >> worthwhile. It would probably be fast as-is if you had an index on >> (telekredytid, datazaw) (in that order). >> >> > That's no problem - we already has changed this query: > SELECT * FROM kredyty kr > where kr.telekredytid = 328652 > and kr.bank = 2 > AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2 > and k2.creationdate > kr.creationdate) > Works good. > > But in fact this wasn't my point. > My point was: why operation CLUSTER has such a big and bad impact on > planer for this query? > Like I sad: before CLUSTER query was run in xx milliseconds :-) > > Before CLUSTER was: # EXPLAIN ANALYZE SELECT telekredytid FROM kredytyag WHERE TRUE AND kredytyag.id = 3064776 AND NOT EXISTS ( SELECT 1 FROM ( SELECT * FROM kredyty kr where telekredytid = 328652 ORDER BY kr.datazaw DESC LIMIT 1 ) kred where kred.bank = 2) ; QUERY PLAN Result (cost=1317.25..1325.55 rows=1 width=4) (actual time=0.235..0.235 rows=0 loops=1) One-Time Filter: (NOT $0) InitPlan -> Subquery Scan kred (cost=1317.24..1317.25 rows=1 width=0) (actual time=0.188..0.188 rows=0 loops=1) Filter: (kred.bank = 2) -> Limit (cost=1317.24..1317.24 rows=1 width=4006) (actual time=0.172..0.172 rows=0 loops=1) -> Sort (cost=1317.24..1320.27 rows=1212 width=4006) (actual time=0.069..0.069 rows=0 loops=1) Sort Key: kr.datazaw Sort Method: quicksort Memory: 25kB -> Index Scan using kredyty_telekredytid_idx on kredyty kr (cost=0.00..1311.18 rows=1212 width=4006) (actual time=0.029..0.029 rows=0 loops=1) Index Cond: (telekredytid = 328652) -> Index Scan using kredytyag_pkey on kredytyag (cost=0.00..8.29 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=1) Index Cond: (id = 3064776) Total runtime: 1.026 ms (14 rows) and that's clear for me. Probably bad index for CLUSTER - Investigating ;-) -- Andrzej Zawadzki -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Problem with partitionning and orderby query plans
Hello, In the same context that my previous thread on this mailing list (the database holding 500k articles of a french daily newspaper), we now need to handle the users' comments on the articles (1 million for now, quickly growing). In our context, we'll have three kind of queries : - queries on articles only ; - queries on comments only ; - queries on both articles and comments. We tried to use the partitionning feature described at http://www.postgresql.org/docs/8.4/static/ddl-partitioning.html , with three tables : - libeindex (master table, no data) - libearticle (articles) - libecontribution (comments) The schema looks like : CREATE TABLE libeindex ( id integer, classname varchar(255), createdAt timestamp, modifiedAt timestamp, ... PRIMARY KEY (classname, id) ); CREATE TABLE libecontribution ( CHECK (classname = 'contribution'), PRIMARY KEY (classname, id) ) INHERITS (libeindex) ; CREATE TABLE libearticle ( CHECK (classname = 'article'), PRIMARY KEY (classname, id) ) INHERITS (libeindex) ; With many indexes are created on the two subtables, including : CREATE INDEX libearticle_createdAt_index ON libearticle (createdAt); CREATE INDEX libearticle_class_createdAt_index ON libearticle (classname, createdAt); The problem we have is that with the partionned table, PostgreSQL is now unable to use the "index scan backwards" query plan on a simple "order by limit" query. For example : libepart=> explain analyze SELECT classname, id FROM libeindex WHERE (classname IN ('article')) ORDER BY createdAt DESC LIMIT 50; QUERY PLAN - Limit (cost=114980.14..114980.27 rows=50 width=20) (actual time=4070.953..4071.076 rows=50 loops=1) -> Sort (cost=114980.14..116427.34 rows=578878 width=20) (actual time=4070.949..4070.991 rows=50 loops=1) Sort Key: public.libeindex.createdat Sort Method: top-N heapsort Memory: 28kB -> Result (cost=0.00..95750.23 rows=578878 width=20) (actual time=0.068..3345.727 rows=578877 loops=1) -> Append (cost=0.00..95750.23 rows=578878 width=20) (actual time=0.066..2338.575 rows=578877 loops=1) -> Index Scan using libeindex_pkey on libeindex (cost=0.00..8.27 rows=1 width=528) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: ((classname)::text = 'article'::text) -> Seq Scan on libearticle libeindex (cost=0.00..95741.96 rows=578877 width=20) (actual time=0.051..1364.296 rows=578877 loops=1) Filter: ((classname)::text = 'article'::text) Total runtime: 4071.195 ms (11 rows) libepart=> explain analyze SELECT classname, id FROM libearticle WHERE (classname IN ('article')) ORDER BY createdAt DESC LIMIT 50; QUERY PLAN Limit (cost=0.00..9.07 rows=50 width=20) (actual time=0.033..0.200 rows=50 loops=1) -> Index Scan Backward using libearticle_createdat_index on libearticle (cost=0.00..105053.89 rows=578877 width=20) (actual time=0.030..0.112 rows=50 loops=1) Filter: ((classname)::text = 'article'::text) Total runtime: 0.280 ms (4 rows) As you can see, PostgreSQL doesn't realize that the table "libeindex" is in fact empty, and that it only needs to query the subtable, on which it can use the "Index Scan Backward" query plan. Is this a known limitation of the partionning method ? If so, it could be interesting to mention it on the documentation. If not, is there a way to work around the problem ? Regards, -- Gaël Le Mignot - g...@pilotsystems.net Pilot Systems - 9, rue Desargues - 75011 Paris Tel : +33 1 44 53 05 55 - www.pilotsystems.net Gérez vos contacts et vos newsletters : www.cockpit-mailing.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to post Performance Questions
Michael Glaesemann wrote: > On Sep 14, 2009, at 16:55 , Josh Berkus wrote: >> Please read the following two documents before posting your >> performance query here: >> >> http://wiki.postgresql.org/wiki/Guide_to_reporting_problems >> http://wiki.postgresql.org/wiki/SlowQueryQuestions >> >> This will help other users to troubleshoot your problems far >> more rapidly. > > Can something similar be added to the footer of (at least) the > performance list? Perhaps on this page?: http://www.postgresql.org/community/lists/ -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] View vs Stored Proc Performance
Merlin Moncure wrote: On Sat, Sep 12, 2009 at 7:51 AM, Dimitri Fontaine wrote: Merlin Moncure writes: like joining the result to another table...the planner can see 'through' the view, etc. in a function, the result is fetched first and materialized without looking at the rest of the query. I though the planner would "see through" SQL language functions and inline them when possible, so they often can make for parametrized views... It can happen for simple functions but often it will not. For views it always happens. Are functions in language 'sql' handled differently than those of language 'plpgsql'? I think they're not so in any case a function will behave as a black box with regards to the planner and optimizer (and views are always 'transparent'). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] View vs Stored Proc Performance
Ivan Voras writes: > Are functions in language 'sql' handled differently than those of > language 'plpgsql'? Yes. > I think they're not so in any case a function will behave as a black box > with regards to the planner and optimizer (and views are always > 'transparent'). No. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] disable heavily updated (but small) table auto-vecuuming
2009/9/15 Ludwik Dylag : > Hello > I have a database where I daily create a table. > Every day it is being inserted with ~3mln rows and each of them is being > updated two times.The process lasts ~24 hours so the db load is the same at > all the time. total size of the table is ~3GB. > My current vacuum settings are: > autovacuum = on > autovacuum_max_workers = 3 > autovacuum_freeze_max_age = 20 (changed from 2) > vacuum_freeze_min_age = 1 > I have over 250 mln of frozen ids. > # SELECT datname, age(datfrozenxid) FROM pg_database; > datname | age > +--- > my_database | 256938425 > and every day (since max age exceeded 200mln.) the current table is being > vacuumed two hours after it was created. > My goal is to set the vacuum properties so the current table is not vacuumed > when it is used. And to vacuum it manually one day after it was used. > Is it enough to set > autovacuum=off > autovacuum_freeze_max_age=20 > vacuum_freeze_min_age = 1 > and shedule in cron daily vacuum on selected table? How about just disabling autovacuum for that table? http://www.postgresql.org/docs/current/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] View vs Stored Proc Performance
2009/9/15 Tom Lane : > Ivan Voras writes: >> Are functions in language 'sql' handled differently than those of >> language 'plpgsql'? > > Yes. > >> I think they're not so in any case a function will behave as a black box >> with regards to the planner and optimizer (and views are always >> 'transparent'). > > No. Thanks! This is interesting information! -- f+rEnSIBITAhITAhLR1nM9F4cIs5KJrhbcsVtUIt7K1MhWJy1A== -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] statement stats extra load?
Is there a rule of thumb for the extra load that will be put on a system when statement stats are turned on? And if so, where does that extra load go?Disk? CPU? RAM? -- “Don't eat anything you've ever seen advertised on TV” - Michael Pollan, author of "In Defense of Food" -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] CLUSTER and a problem
Andrzej Zawadzki wrote: > Tom Lane wrote: > >> Andrzej Zawadzki writes: >> >> >>> # EXPLAIN ANALYZE SElect telekredytid from kredytyag >>> WHERE TRUE >>> AND kredytyag.id = 3064776 >>> AND NOT EXISTS >>> (SELECT 1 FROM >>> ( SELECT * FROM kredyty kr >>> where telekredytid = 328650 >>> ORDER BY kr.datazaw DESC LIMIT 1 ) >>> kred where kred.bank = 2); >>> >>> >> So this is the slow bit: >> >> >> >>> -> Subquery Scan kred (cost=0.00..778.06 rows=1 width=0) (actual >>> time=2045556.496..2045556.496 rows=0 loops=1) >>>Filter: (kred.bank = 2) >>>-> Limit (cost=0.00..778.05 rows=1 width=3873) (actual >>> time=2045556.492..2045556.492 rows=0 loops=1) >>> -> Index Scan Backward using kredyty_datazaw on >>> kredyty kr (cost=0.00..1088490.39 rows=1399 width=3873) (actual >>> time=2045556.487..2045556.487 rows=0 loops=1) >>>Filter: (telekredytid = 328650) >>> >>> >> It's doing a scan in descending datazaw order and hoping to find a row >> that has both telekredytid = 328650 and bank = 2. Evidently there isn't >> one, so the indexscan runs clear to the end before it can report that the >> NOT EXISTS is true. Unfortunately, you've more or less forced this >> inefficient query plan by wrapping some of the search conditions inside a >> LIMIT and some outside. Try phrasing the NOT EXISTS query differently. >> Or, if you do this type of query a lot, a special-purpose index might be >> worthwhile. It would probably be fast as-is if you had an index on >> (telekredytid, datazaw) (in that order). >> >> > That's no problem - we already has changed this query: > SELECT * FROM kredyty kr > where kr.telekredytid = 328652 > and kr.bank = 2 > AND NOT EXISTS (SELECT * from kredyty k2 WHERE k2.bank<>2 > and k2.creationdate > kr.creationdate) > Works good. > > But in fact this wasn't my point. > My point was: why operation CLUSTER has such a big and bad impact on > planer for this query? > Like I sad: before CLUSTER query was run in xx milliseconds :-) > OK I've got it :-) I've prepared test database (on fast disks - CLUSTER took 2h anyway ;-) Step 1: qstest=# CREATE UNIQUE INDEX kredyty_desc_pkey ON kredyty using btree (id desc); CREATE INDEX Step 2: qstest=# CLUSTER kredyty USING kredyty_desc_pkey; CLUSTER Step 3: qstest=# ANALYZE kredyty; ANALYZE Step 4: qstest=# EXPLAIN ANALYZE SELECT telekredytid FROM kredytyag WHERE TRUE AND kredytyag.id = 3064776 AND NOT EXISTS ( SELECT 1 FROM ( SELECT * FROM kredyty kr where telekredytid = 328652 ORDER BY kr.datazaw DESC LIMIT 1 ) kred where kred.bank = 2) ; QUERY PLAN - Result (cost=833.09..841.38 rows=1 width=4) (actual time=70.050..70.050 rows=0 loops=1) One-Time Filter: (NOT $0) InitPlan -> Subquery Scan kred (cost=833.07..833.09 rows=1 width=0) (actual time=48.223..48.223 rows=0 loops=1) Filter: (kred.bank = 2) -> Limit (cost=833.07..833.08 rows=1 width=3975) (actual time=48.206..48.206 rows=0 loops=1)