Re: [PERFORM] planner favors seq scan too early

2008-02-21 Thread Richard Huxton

Markus Bertheau wrote:


I'm getting a plan that uses a sequential scan on ext_feeder_item instead of
several index scans, which slows down the query significantly:

# explain analyze select fi.pub_date from ext_feeder_item fi where fi.feed_id
  in (select id from ext_feeder_feed ff where ff.is_system) order by pub_date
  desc;
 Sort  (cost=298545.70..299196.46 rows=260303 width=8) (actual
time=89299.623..89302.146 rows=807 loops=1)



Using LIMIT in the subquery I can see that starting with 50 values for the in
the planner starts to prefer the seq scan. Plan for 49:



 Sort  (cost=277689.24..277918.39 rows=91660 width=8) (actual
time=477.769..478.193 rows=137 loops=1)



Note that the rows estimate for the index scan is way off. Increasing
statistics target for ext_feeder_item.feed_id to 100 lets the planner favor the
index scan up to LIMIT 150 for the subquery.

Using enable_seqscan=false, I see that the index scan plan continues to
outperform the seqscan plan even with limit 1500 in the subquery (1196 values
actually returned from it):



 Sort  (cost=100925142.27..100925986.74 rows=337787 width=8) (actual
time=102.111..104.627 rows=807 loops=1)



Why does the planner choose that way and what can I do to make it choose the
better plan, preferably without specifying limit and a maybe unreasonably high
statistics target for ext_feeder_item.feed_id?


Although the index scans are fast enough, the cost estimate is much more.

This suggests you need to tweak your planner cost settings:
http://www.postgresql.org/docs/8.3/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

I'd probably start with reducing random_page_cost if you have a 
reasonable disk system and making sure effective_cache_size is 
accurately set.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Question about shared_buffers and cpu usage

2008-02-21 Thread Dave Cramer


On 21-Feb-08, at 12:13 AM, bh yuan wrote:


Hi

I am using Postgres8.3 on 8G memory , Xeon X5355 Quad Core x 2
processer RH5 machine with 10G data. (with  some table which have
about 2,000,000~ 5,000,000 rows )

I have two quesion.
1. how to set the shared_buffers and other postgresql.conf parameter
for best performance?
I only run the Postgres8.3 on the machine so I set the shared_buffers
= 7168MB (7G)
But somebody said it is too big, so confused.


Yes, it is too big! make it 2G to start





---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-21 Thread Matthew

On Wed, 20 Feb 2008, Tom Lane wrote:

However, this resulted in random errors from Postgres - something to do
with locked tables. So I changed it so that no two threads create indexes
for the same table at once, and that solved it.


How long ago was that?  There used to be some issues with two CREATE
INDEXes both trying to update the pg_class row, but I thought we'd fixed
it.


It was a while back, and that sounds like exactly the error it returned. 
It sounds like you have fixed it.


Matthew

--
Software suppliers are trying to make their software packages more
'user-friendly' Their best approach, so far, has been to take all
the old brochures, and stamp the words, 'user-friendly' on the cover.
-- Bill Gates

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-21 Thread Guillaume Cottenceau
Jeff  writes:

> I wonder if it would be worthwhile if pg_restore could emit a warning
> if maint_work_mem is "low" (start flamewar on what "low" is).
>
> And as an addition to that - allow a cmd line arg to have pg_restore
> bump it before doing its work?  On several occasions I was moving a
> largish table and the COPY part went plenty fast, but when it hit
> index creation it slowed down to a crawl due to low maint_work_mem..

I have made a comparison restoring a production dump with default
and large maintenance_work_mem. The speedup improvement here is
only of 5% (12'30 => 11'50).

Apprently, on the restored database, data is 1337 MB[1] and
indexes 644 MB[2][2]. Pg is 8.2.3, checkpoint_segments 3,
maintenance_work_mem default (16MB) then 512MB, shared_buffers
384MB. It is rather slow disks (Dell's LSI Logic RAID1), hdparm
reports 82 MB/sec for reads.

Ref: 
[1] db=# SELECT sum(relpages)*8/1024 FROM pg_class, pg_namespace WHERE 
pg_namespace.oid = pg_class.relnamespace AND relkind = 'r' AND nspname = 
'public';
 ?column? 
--
 1337
 
(query run after ANALYZE)

notice there are quite few toast pages to account:

db=# SELECT relname, relpages FROM pg_class WHERE relname like '%toast%' 
ORDER BY relpages DESC;
   relname| relpages 
--+--
 pg_toast_2618|   17
 pg_toast_2618_index  |2
 pg_toast_87570_index |1
 pg_toast_87582_index |1
 (...)

[2] db=# SELECT sum(relpages)*8/1024 FROM pg_class, pg_namespace WHERE 
pg_namespace.oid = pg_class.relnamespace AND relkind = 'i' AND nspname = 
'public';
 ?column? 
--
  644

-- 
Guillaume Cottenceau

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-21 Thread Tom Lane
Guillaume Cottenceau <[EMAIL PROTECTED]> writes:
> I have made a comparison restoring a production dump with default
> and large maintenance_work_mem. The speedup improvement here is
> only of 5% (12'30 => 11'50).

> Apprently, on the restored database, data is 1337 MB[1] and
> indexes 644 MB[2][2]. Pg is 8.2.3, checkpoint_segments 3,
> maintenance_work_mem default (16MB) then 512MB, shared_buffers
> 384MB. It is rather slow disks (Dell's LSI Logic RAID1), hdparm
> reports 82 MB/sec for reads.

The main thing that jumps out at me is that boosting checkpoint_segments
would probably help.  I tend to set it to 30 or so (note that this
corresponds to about 1GB taken up by pg_xlog).

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 7 hrs for a pg_restore?

2008-02-21 Thread Vivek Khera


On Feb 21, 2008, at 12:28 PM, Guillaume Cottenceau wrote:


I have made a comparison restoring a production dump with default
and large maintenance_work_mem. The speedup improvement here is
only of 5% (12'30 => 11'50).


At one point I was evaluating several server vendors and did a bunch  
of DB restores.  The one thing that gave me the biggest benefit was to  
bump the number of checkpoint segments to a high number, like 128 or  
256.  Everything else was mostly minor increases in speed.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
Hi all,

The following query takes about 4s to run in a 16GB ram server. Any ideas
why it doesn´t use index for the primary keys in the join conditions?

select i.inuid, count(*) as total
from cte.instrumentounidade i
inner join cte.pontuacao p on p.inuid = i.inuid
inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
inner join cte.subacaoindicador si on si.aciid = ai.aciid
where i.itrid = 2 and p.ptostatus = 'A'
group by i.inuid
having count(*) > 0

HashAggregate  (cost=47905.87..47941.01 rows=2008 width=4)
 Filter: (count(*) > 0)
 ->  Hash Join  (cost=16307.79..46511.45 rows=185923 width=4)
   Hash Cond: (si.aciid = ai.aciid)
   ->  Seq Scan on subacaoindicador si  (cost=0.00..22812.17 rows=368817
width=4)
   ->  Hash  (cost=16211.40..16211.40 rows=38556 width=8)
 ->  Hash Join  (cost=9018.20..16211.40 rows=38556 width=8)
   Hash Cond: (p.inuid = i.inuid)
   ->  Hash Join  (cost=8908.41..15419.10 rows=39593
width=8)
 Hash Cond: (ai.ptoid = p.ptoid)
 ->  Seq Scan on acaoindicador ai  (cost=
0.00..4200.84 rows=76484 width=8)
 ->  Hash  (cost=8678.33..8678.33 rows=92034
width=8)
   ->  Seq Scan on pontuacao p  (cost=
0.00..8678.33 rows=92034 width=8)
 Filter: (ptostatus = 'A'::bpchar)
   ->  Hash  (cost=104.78..104.78 rows=2008 width=4)
 ->  Seq Scan on instrumentounidade i  (cost=
0.00..104.78 rows=2008 width=4)
   Filter: (itrid = 2)


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
HashAggregate  (cost=47818.40..47853.12 rows=1984 width=4) (actual time=
5738.879..5743.390 rows=1715 loops=1)
 Filter: (count(*) > 0)
 ->  Hash Join  (cost=16255.99..46439.06 rows=183912 width=4) (actual time=
1887.974..5154.207 rows=241693 loops=1)
   Hash Cond: (si.aciid = ai.aciid)
   ->  Seq Scan on subacaoindicador si  (cost=0.00..22811.98 rows=368798
width=4) (actual time=0.108..1551.816 rows=368798 loops=1)
   ->  Hash  (cost=16160.64..16160.64 rows=38141 width=8) (actual time=
1887.790..1887.790 rows=52236 loops=1)
 ->  Hash Join  (cost=9015.31..16160.64 rows=38141 width=8)
(actual time=980.058..1773.530 rows=52236 loops=1)
   Hash Cond: (p.inuid = i.inuid)
   ->  Hash Join  (cost=8905.89..15376.11 rows=39160
width=8) (actual time=967.116..1568.028 rows=54225 loops=1)
 Hash Cond: (ai.ptoid = p.ptoid)
 ->  Seq Scan on acaoindicador ai  (cost=
0.00..4200.84 rows=76484 width=8) (actual time=0.080..259.412 rows=76484
loops=1)
 ->  Hash  (cost=8678.33..8678.33 rows=91026
width=8) (actual time=966.841..966.841 rows=92405 loops=1)
   ->  Seq Scan on pontuacao p  (cost=
0.00..8678.33 rows=91026 width=8) (actual time=0.087..746.528 rows=92405
loops=1)
 Filter: (ptostatus = 'A'::bpchar)
   ->  Hash  (cost=104.46..104.46 rows=1984 width=4) (actual
time=12.913..12.913 rows=1983 loops=1)
 ->  Seq Scan on instrumentounidade i  (cost=
0.00..104.46 rows=1984 width=4) (actual time=0.091..8.879 rows=1983 loops=1)
   Filter: (itrid = 2)
Total runtime: 5746.415 ms

On Thu, Feb 21, 2008 at 5:58 PM, Scott Marlowe <[EMAIL PROTECTED]>
wrote:

> On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <[EMAIL PROTECTED]>
> wrote:
> > Hi all,
> >
> > The following query takes about 4s to run in a 16GB ram server. Any
> ideas
> > why it doesn´t use index for the primary keys in the join conditions?
> >
> > select i.inuid, count(*) as total
> > from cte.instrumentounidade i
> >  inner join cte.pontuacao p on p.inuid = i.inuid
> > inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
> > inner join cte.subacaoindicador si on si.aciid = ai.aciid
> > where i.itrid = 2 and p.ptostatus = 'A'
> > group by i.inuid
> >  having count(*) > 0
>
> What does explain analyze say about that query?
>


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Claus Guttesen
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?

Maby random_page_cost is set too high? What version are you using?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
On Thu, Feb 21, 2008 at 6:10 PM, Claus Guttesen <[EMAIL PROTECTED]> wrote:

> > The following query takes about 4s to run in a 16GB ram server. Any
> ideas
> > why it doesn´t use index for the primary keys in the join conditions?
>
> Maby random_page_cost is set too high? What version are you using?


Postgresql v. 8.2.1


>
> --
> regards
> Claus
>
> When lenity and cruelty play for a kingdom,
> the gentlest gamester is the soonest winner.
>
> Shakespeare
>


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Claus Guttesen
> > > why it doesn´t use index for the primary keys in the join conditions?
> >
> > Maby random_page_cost is set too high? What version are you using?
>
> Postgresql v. 8.2.1

You can try to lower this value. The default (in 8.3) is 4.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Scott Marlowe
On Thu, Feb 21, 2008 at 2:48 PM, Adonias Malosso <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> The following query takes about 4s to run in a 16GB ram server. Any ideas
> why it doesn´t use index for the primary keys in the join conditions?
>
> select i.inuid, count(*) as total
> from cte.instrumentounidade i
>  inner join cte.pontuacao p on p.inuid = i.inuid
> inner join cte.acaoindicador ai on ai.ptoid = p.ptoid
> inner join cte.subacaoindicador si on si.aciid = ai.aciid
> where i.itrid = 2 and p.ptostatus = 'A'
> group by i.inuid
>  having count(*) > 0

What does explain analyze say about that query?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Adonias Malosso
Set random_page_cost = 2 solved the problem. thanks

On Thu, Feb 21, 2008 at 6:16 PM, Claus Guttesen <[EMAIL PROTECTED]> wrote:

> > > > why it doesn´t use index for the primary keys in the join
> conditions?
> > >
> > > Maby random_page_cost is set too high? What version are you using?
> >
> > Postgresql v. 8.2.1
>
> You can try to lower this value. The default (in 8.3) is 4.
>
> --
> regards
> Claus
>
> When lenity and cruelty play for a kingdom,
> the gentlest gamester is the soonest winner.
>
> Shakespeare
>


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Scott Marlowe
Well, all the row counts in expected and actual are pretty close.  I'm
guessing it's as optimized as it's likely to get.  you could try
mucking about with random_page_cost to force index usage, but indexes
are not always a win in pgsql, hence the seq scans etc...  If the
number of rows returned represents a large percentage of the total
number of rows in the table, then a seq scan is generally a win.  Note
that most all the time being spent in this query is on the Hash Join,
not on the seq scans.

Also, you should really update to 8.2.6 the latest 8.2 version.  Check
the release notes for the bugs that were fixed between 8.2.1 and 8.2.6

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Mark Kirkwood
The other parameter you might want to look at is effective_cache_size - 
increasing it will encourage index use. On a machine with 16GB the 
default is probably too small (there are various recommendations about 
how to set this ISTR either Scott M or Greg Smith had a page somewhere 
that covered this quite well - guys?).


Obviously, decreasing random_page_cost fixed this query for you, but if 
find yourself needing to tweak it again for other queries, then look at 
changing effective_cache_size.


Cheers

Mark


Adonias Malosso wrote:

Set random_page_cost = 2 solved the problem. thanks




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Scott Marlowe
On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood <[EMAIL PROTECTED]> wrote:
> The other parameter you might want to look at is effective_cache_size -
>  increasing it will encourage index use. On a machine with 16GB the
>  default is probably too small (there are various recommendations about
>  how to set this ISTR either Scott M or Greg Smith had a page somewhere
>  that covered this quite well - guys?).
>
>  Obviously, decreasing random_page_cost fixed this query for you, but if
>  find yourself needing to tweak it again for other queries, then look at
>  changing effective_cache_size.

effective_cache_size is pretty easy to set, and it's not real
sensitive to small changes, so guesstimation is fine where it's
concerned.  Basically, let your machine run for a while, then  add the
cache and buffer your unix kernel has altogether (top and free will
tell you these things).  If you're running other apps on the server,
make a SWAG (scientific wild assed guess) how much the other apps are
pounding on the kernel cache / buffer and set effective_cache_size to
how much you think postgresql is using of the total and set it to
that.

If your data set fits into memory, then setting random page cost
closer to 1 makes a lot of sense, and the larger effective cache size.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Dave Cramer


On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote:

On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood  
<[EMAIL PROTECTED]> wrote:
The other parameter you might want to look at is  
effective_cache_size -

increasing it will encourage index use. On a machine with 16GB the
default is probably too small (there are various recommendations  
about
how to set this ISTR either Scott M or Greg Smith had a page  
somewhere

that covered this quite well - guys?).


The default is always too small in my experience.

What are the rest of the configuration values ?

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] config settings, was: 4s query want to run faster

2008-02-21 Thread Scott Marlowe
On Thu, Feb 21, 2008 at 5:40 PM, Dave Cramer <[EMAIL PROTECTED]> wrote:
>
>  On 21-Feb-08, at 6:16 PM, Scott Marlowe wrote:
>
>  > On Thu, Feb 21, 2008 at 4:59 PM, Mark Kirkwood
>  > <[EMAIL PROTECTED]> wrote:
>  >> The other parameter you might want to look at is
>  >> effective_cache_size -
>  >> increasing it will encourage index use. On a machine with 16GB the
>  >> default is probably too small (there are various recommendations
>  >> about
>  >> how to set this ISTR either Scott M or Greg Smith had a page
>  >> somewhere
>  >> that covered this quite well - guys?).
>  >>
>  The default is always too small in my experience.
>
>  What are the rest of the configuration values ?

I was thinking that we almost need a matrix of versions and small,
typical, large, and too big or whatever for each version, and which
hardware configs.

max_connections is the one I see abused a lot here.  It's a setting
that you can set way too high and not notice there's a problem until
you go to actually use that many connections and find out your
database performance just went south.

One should closely monitor connection usage and track it over time, as
well as benchmark the behavior of your db under realistic but heavy
load.  You should know how many connections you can handle in a test
setup before things get ugly, and then avoid setting max_connections
any higher than about half that if you can do it. Same kind of
thinking applies to any resource that has straightline 1:1 increase in
resource usage, or a tendency towards that, like work_mem (formerly
sort_mem).  Dammit, nearly every one really needs it's own mini-howto
on how to set it... They all are covered in the runtime config section
of the docs.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] 4s query want to run faster

2008-02-21 Thread Mark Kirkwood

Scott Marlowe wrote:


effective_cache_size is pretty easy to set, and it's not real
sensitive to small changes, so guesstimation is fine where it's
concerned.  Basically, let your machine run for a while, then  add the
cache and buffer your unix kernel has altogether (top and free will
tell you these things).  If you're running other apps on the server,
make a SWAG (scientific wild assed guess) how much the other apps are
pounding on the kernel cache / buffer and set effective_cache_size to
how much you think postgresql is using of the total and set it to
that.
  


FWIW - The buffered|cached may well be called something different if you 
are not on Linux (I didn't see any platform mentioned - sorry if I 
missed it) - e.g for Freebsd it is "Inactive" that shows what the os is 
caching and "Cached" actually means something slightly different... (yep 
that's caused a lot of confusion in the past...)


Cheers

Mark

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match