Re: [PERFORM] planner favors seq scan too early
Markus Bertheau wrote: I'm getting a plan that uses a sequential scan on ext_feeder_item instead of several index scans, which slows down the query significantly: # explain analyze select fi.pub_date from ext_feeder_item fi where fi.feed_id in (select id from ext_feeder_feed ff where ff.is_system) order by pub_date desc; Sort (cost=298545.70..299196.46 rows=260303 width=8) (actual time=89299.623..89302.146 rows=807 loops=1) Using LIMIT in the subquery I can see that starting with 50 values for the in the planner starts to prefer the seq scan. Plan for 49: Sort (cost=277689.24..277918.39 rows=91660 width=8) (actual time=477.769..478.193 rows=137 loops=1) Note that the rows estimate for the index scan is way off. Increasing statistics target for ext_feeder_item.feed_id to 100 lets the planner favor the index scan up to LIMIT 150 for the subquery. Using enable_seqscan=false, I see that the index scan plan continues to outperform the seqscan plan even with limit 1500 in the subquery (1196 values actually returned from it): Sort (cost=100925142.27..100925986.74 rows=337787 width=8) (actual time=102.111..104.627 rows=807 loops=1) Why does the planner choose that way and what can I do to make it choose the better plan, preferably without specifying limit and a maybe unreasonably high statistics target for ext_feeder_item.feed_id? Although the index scans are fast enough, the cost estimate is much more. This suggests you need to tweak your planner cost settings: http://www.postgresql.org/docs/8.3/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS I'd probably start with reducing random_page_cost if you have a reasonable disk system and making sure effective_cache_size is accurately set. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Question about shared_buffers and cpu usage
On 21-Feb-08, at 12:13 AM, bh yuan wrote: Hi I am using Postgres8.3 on 8G memory , Xeon X5355 Quad Core x 2 processer RH5 machine with 10G data. (with some table which have about 2,000,000~ 5,000,000 rows ) I have two quesion. 1. how to set the shared_buffers and other postgresql.conf parameter for best performance? I only run the Postgres8.3 on the machine so I set the shared_buffers = 7168MB (7G) But somebody said it is too big, so confused. Yes, it is too big! make it 2G to start ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 7 hrs for a pg_restore?
On Wed, 20 Feb 2008, Tom Lane wrote: However, this resulted in random errors from Postgres - something to do with locked tables. So I changed it so that no two threads create indexes for the same table at once, and that solved it. How long ago was that? There used to be some issues with two CREATE INDEXes both trying to update the pg_class row, but I thought we'd fixed it. It was a while back, and that sounds like exactly the error it returned. It sounds like you have fixed it. Matthew -- Software suppliers are trying to make their software packages more 'user-friendly' Their best approach, so far, has been to take all the old brochures, and stamp the words, 'user-friendly' on the cover. -- Bill Gates ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 7 hrs for a pg_restore?
Jeff writes: > I wonder if it would be worthwhile if pg_restore could emit a warning > if maint_work_mem is "low" (start flamewar on what "low" is). > > And as an addition to that - allow a cmd line arg to have pg_restore > bump it before doing its work? On several occasions I was moving a > largish table and the COPY part went plenty fast, but when it hit > index creation it slowed down to a crawl due to low maint_work_mem.. I have made a comparison restoring a production dump with default and large maintenance_work_mem. The speedup improvement here is only of 5% (12'30 => 11'50). Apprently, on the restored database, data is 1337 MB[1] and indexes 644 MB[2][2]. Pg is 8.2.3, checkpoint_segments 3, maintenance_work_mem default (16MB) then 512MB, shared_buffers 384MB. It is rather slow disks (Dell's LSI Logic RAID1), hdparm reports 82 MB/sec for reads. Ref: [1] db=# SELECT sum(relpages)*8/1024 FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'r' AND nspname = 'public'; ?column? -- 1337 (query run after ANALYZE) notice there are quite few toast pages to account: db=# SELECT relname, relpages FROM pg_class WHERE relname like '%toast%' ORDER BY relpages DESC; relname| relpages --+-- pg_toast_2618| 17 pg_toast_2618_index |2 pg_toast_87570_index |1 pg_toast_87582_index |1 (...) [2] db=# SELECT sum(relpages)*8/1024 FROM pg_class, pg_namespace WHERE pg_namespace.oid = pg_class.relnamespace AND relkind = 'i' AND nspname = 'public'; ?column? -- 644 -- Guillaume Cottenceau ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 7 hrs for a pg_restore?
Guillaume Cottenceau <[EMAIL PROTECTED]> writes: > I have made a comparison restoring a production dump with default > and large maintenance_work_mem. The speedup improvement here is > only of 5% (12'30 => 11'50). > Apprently, on the restored database, data is 1337 MB[1] and > indexes 644 MB[2][2]. Pg is 8.2.3, checkpoint_segments 3, > maintenance_work_mem default (16MB) then 512MB, shared_buffers > 384MB. It is rather slow disks (Dell's LSI Logic RAID1), hdparm > reports 82 MB/sec for reads. The main thing that jumps out at me is that boosting checkpoint_segments would probably help. I tend to set it to 30 or so (note that this corresponds to about 1GB taken up by pg_xlog). regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 7 hrs for a pg_restore?
On Feb 21, 2008, at 12:28 PM, Guillaume Cottenceau wrote: I have made a comparison restoring a production dump with default and large maintenance_work_mem. The speedup improvement here is only of 5% (12'30 => 11'50). At one point I was evaluating several server vendors and did a bunch of DB restores. The one thing that gave me the biggest benefit was to bump the number of checkpoint segments to a high number, like 128 or 256. Everything else was mostly minor increases in speed. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] 4s query want to run faster
Hi all, The following query takes about 4s to run in a 16GB ram server. Any ideas why it doesn´t use index for the primary keys in the join conditions? select i.inuid, count(*) as total from cte.instrumentounidade i inner join cte.pontuacao p on p.inuid = i.inuid inner join cte.acaoindicador ai on ai.ptoid = p.ptoid inner join cte.subacaoindicador si on si.aciid = ai.aciid where i.itrid = 2 and p.ptostatus = 'A' group by i.inuid having count(*) > 0 HashAggregate (cost=47905.87..47941.01 rows=2008 width=4) Filter: (count(*) > 0) -> Hash Join (cost=16307.79..46511.45 rows=185923 width=4) Hash Cond: (si.aciid = ai.aciid) -> Seq Scan on subacaoindicador si (cost=0.00..22812.17 rows=368817 width=4) -> Hash (cost=16211.40..16211.40 rows=38556 width=8) -> Hash Join (cost=9018.20..16211.40 rows=38556 width=8) Hash Cond: (p.inuid = i.inuid) -> Hash Join (cost=8908.41..15419.10 rows=39593 width=8) Hash Cond: (ai.ptoid = p.ptoid) -> Seq Scan on acaoindicador ai (cost= 0.00..4200.84 rows=76484 width=8) -> Hash (cost=8678.33..8678.33 rows=92034 width=8) -> Seq Scan on pontuacao p (cost= 0.00..8678.33 rows=92034 width=8) Filter: (ptostatus = 'A'::bpchar) -> Hash (cost=104.78..104.78 rows=2008 width=4) -> Seq Scan on instrumentounidade i (cost= 0.00..104.78 rows=2008 width=4) Filter: (itrid = 2)
Re: [PERFORM] 4s query want to run faster
HashAggregate (cost=47818.40..47853.12 rows=1984 width=4) (actual time= 5738.879..5743.390 rows=1715 loops=1) Filter: (count(*) > 0) -> Hash Join (cost=16255.99..46439.06 rows=183912 width=4) (actual time= 1887.974..5154.207 rows=241693 loops=1) Hash Cond: (si.aciid = ai.aciid) -> Seq Scan on subacaoindicador si (cost=0.00..22811.98 rows=368798 width=4) (actual time=0.108..1551.816 rows=368798 loops=1) -> Hash (cost=16160.64..16160.64 rows=38141 width=8) (actual time= 1887.790..1887.790 rows=52236 loops=1) -> Hash Join (cost=9015.31..16160.64 rows=38141 width=8) (actual time=980.058..1773.530 rows=52236 loops=1) Hash Cond: (p.inuid = i.inuid) -> Hash Join (cost=8905.89..15376.11 rows=39160 width=8) (actual time=967.116..1568.028 rows=54225 loops=1) Hash Cond: (ai.ptoid = p.ptoid) -> Seq Scan on acaoindicador ai (cost= 0.00..4200.84 rows=76484 width=8) (actual time=0.080..259.412 rows=76484 loops=1) -> Hash (cost=8678.33..8678.33 rows=91026 width=8) (actual time=966.841..966.841 rows=92405 loops=1) -> Seq Scan on pontuacao p (cost= 0.00..8678.33 rows=91026 width=8) (actual time=0.087..746.528 rows=92405 loops=1) Filter: (ptostatus = 'A'::bpchar) -> Hash (cost=104.46..104.46 rows=1984 width=4) (actual time=12.913..12.913 rows=1983 loops=1) -> Seq Scan on instrumentounidade i (cost= 0.00..104.46 rows=1984 width=4) (actual time=0.091..8.879 rows=1983 loops=1) Filter: (itrid = 2) Total runtime: 5746.415 ms On Thu, Feb 21, 2008 at 5:58 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote: > On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <[EMAIL PROTECTED]> > wrote: > > Hi all, > > > > The following query takes about 4s to run in a 16GB ram server. Any > ideas > > why it doesn´t use index for the primary keys in the join conditions? > > > > select i.inuid, count(*) as total > > from cte.instrumentounidade i > > inner join cte.pontuacao p on p.inuid = i.inuid > > inner join cte.acaoindicador ai on ai.ptoid = p.ptoid > > inner join cte.subacaoindicador si on si.aciid = ai.aciid > > where i.itrid = 2 and p.ptostatus = 'A' > > group by i.inuid > > having count(*) > 0 > > What does explain analyze say about that query? >
Re: [PERFORM] 4s query want to run faster
> The following query takes about 4s to run in a 16GB ram server. Any ideas > why it doesn´t use index for the primary keys in the join conditions? Maby random_page_cost is set too high? What version are you using? -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 4s query want to run faster
On Thu, Feb 21, 2008 at 6:10 PM, Claus Guttesen <[EMAIL PROTECTED]> wrote: > > The following query takes about 4s to run in a 16GB ram server. Any > ideas > > why it doesn´t use index for the primary keys in the join conditions? > > Maby random_page_cost is set too high? What version are you using? Postgresql v. 8.2.1 > > -- > regards > Claus > > When lenity and cruelty play for a kingdom, > the gentlest gamester is the soonest winner. > > Shakespeare >
Re: [PERFORM] 4s query want to run faster
> > > why it doesn´t use index for the primary keys in the join conditions? > > > > Maby random_page_cost is set too high? What version are you using? > > Postgresql v. 8.2.1 You can try to lower this value. The default (in 8.3) is 4. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 4s query want to run faster
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <[EMAIL PROTECTED]> wrote: > Hi all, > > The following query takes about 4s to run in a 16GB ram server. Any ideas > why it doesn´t use index for the primary keys in the join conditions? > > select i.inuid, count(*) as total > from cte.instrumentounidade i > inner join cte.pontuacao p on p.inuid = i.inuid > inner join cte.acaoindicador ai on ai.ptoid = p.ptoid > inner join cte.subacaoindicador si on si.aciid = ai.aciid > where i.itrid = 2 and p.ptostatus = 'A' > group by i.inuid > having count(*) > 0 What does explain analyze say about that query? ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 4s query want to run faster
Set random_page_cost = 2 solved the problem. thanks On Thu, Feb 21, 2008 at 6:16 PM, Claus Guttesen <[EMAIL PROTECTED]> wrote: > > > > why it doesn´t use index for the primary keys in the join > conditions? > > > > > > Maby random_page_cost is set too high? What version are you using? > > > > Postgresql v. 8.2.1 > > You can try to lower this value. The default (in 8.3) is 4. > > -- > regards > Claus > > When lenity and cruelty play for a kingdom, > the gentlest gamester is the soonest winner. > > Shakespeare >
Re: [PERFORM] 4s query want to run faster
Well, all the row counts in expected and actual are pretty close. I'm guessing it's as optimized as it's likely to get. you could try mucking about with random_page_cost to force index usage, but indexes are not always a win in pgsql, hence the seq scans etc... If the number of rows returned represents a large percentage of the total number of rows in the table, then a seq scan is generally a win. Note that most all the time being spent in this query is on the Hash Join, not on the seq scans. Also, you should really update to 8.2.6 the latest 8.2 version. Check the release notes for the bugs that were fixed between 8.2.1 and 8.2.6 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 4s query want to run faster
The other parameter you might want to look at is effective_cache_size - increasing it will encourage index use. On a machine with 16GB the default is probably too small (there are various recommendations about how to set this ISTR either Scott M or Greg Smith had a page somewhere that covered this quite well - guys?). Obviously, decreasing random_page_cost fixed this query for you, but if find yourself needing to tweak it again for other queries, then look at changing effective_cache_size. Cheers Mark Adonias Malosso wrote: Set random_page_cost = 2 solved the problem. thanks ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] 4s query want to run faster
On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood <[EMAIL PROTECTED]> wrote: > The other parameter you might want to look at is effective_cache_size - > increasing it will encourage index use. On a machine with 16GB the > default is probably too small (there are various recommendations about > how to set this ISTR either Scott M or Greg Smith had a page somewhere > that covered this quite well - guys?). > > Obviously, decreasing random_page_cost fixed this query for you, but if > find yourself needing to tweak it again for other queries, then look at > changing effective_cache_size. effective_cache_size is pretty easy to set, and it's not real sensitive to small changes, so guesstimation is fine where it's concerned. Basically, let your machine run for a while, then add the cache and buffer your unix kernel has altogether (top and free will tell you these things). If you're running other apps on the server, make a SWAG (scientific wild assed guess) how much the other apps are pounding on the kernel cache / buffer and set effective_cache_size to how much you think postgresql is using of the total and set it to that. If your data set fits into memory, then setting random page cost closer to 1 makes a lot of sense, and the larger effective cache size. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 4s query want to run faster
On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote: On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood <[EMAIL PROTECTED]> wrote: The other parameter you might want to look at is effective_cache_size - increasing it will encourage index use. On a machine with 16GB the default is probably too small (there are various recommendations about how to set this ISTR either Scott M or Greg Smith had a page somewhere that covered this quite well - guys?). The default is always too small in my experience. What are the rest of the configuration values ? Dave ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] config settings, was: 4s query want to run faster
On Thu, Feb 21, 2008 at 5:40 PM, Dave Cramer <[EMAIL PROTECTED]> wrote: > > On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote: > > > On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood > > <[EMAIL PROTECTED]> wrote: > >> The other parameter you might want to look at is > >> effective_cache_size - > >> increasing it will encourage index use. On a machine with 16GB the > >> default is probably too small (there are various recommendations > >> about > >> how to set this ISTR either Scott M or Greg Smith had a page > >> somewhere > >> that covered this quite well - guys?). > >> > The default is always too small in my experience. > > What are the rest of the configuration values ? I was thinking that we almost need a matrix of versions and small, typical, large, and too big or whatever for each version, and which hardware configs. max_connections is the one I see abused a lot here. It's a setting that you can set way too high and not notice there's a problem until you go to actually use that many connections and find out your database performance just went south. One should closely monitor connection usage and track it over time, as well as benchmark the behavior of your db under realistic but heavy load. You should know how many connections you can handle in a test setup before things get ugly, and then avoid setting max_connections any higher than about half that if you can do it. Same kind of thinking applies to any resource that has straightline 1:1 increase in resource usage, or a tendency towards that, like work_mem (formerly sort_mem). Dammit, nearly every one really needs it's own mini-howto on how to set it... They all are covered in the runtime config section of the docs. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] 4s query want to run faster
Scott Marlowe wrote: effective_cache_size is pretty easy to set, and it's not real sensitive to small changes, so guesstimation is fine where it's concerned. Basically, let your machine run for a while, then add the cache and buffer your unix kernel has altogether (top and free will tell you these things). If you're running other apps on the server, make a SWAG (scientific wild assed guess) how much the other apps are pounding on the kernel cache / buffer and set effective_cache_size to how much you think postgresql is using of the total and set it to that. FWIW - The buffered|cached may well be called something different if you are not on Linux (I didn't see any platform mentioned - sorry if I missed it) - e.g for Freebsd it is "Inactive" that shows what the os is caching and "Cached" actually means something slightly different... (yep that's caused a lot of confusion in the past...) Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match