Re: [PERFORM] CPU load

2008-09-29 Thread kiki
Thanks’ for the instructions for detecting the problem.
It helped a lot.

First I have increased shared_buffers from 2000 to 8000. Since the
postgresql is on Debian I had to increase SHMMAX kernel value.
Everything is working much faster now.
There is still heavy load of postmaster process (up to 100%) for a simple
query

EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
LIMIT 1;

(the table is indexed by id_camera, has around 1 milion rows, and this
query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
around 4800 ms, and this table is queried a lot although not so often
queried modified)

but I don't think that is strange behavior of the postgresql.
And it is exhibited all the time; the postgresql reset does not influence
it at all.
Once again thanks a lot, I learned a lot.

Regards,
Maja
> It would be useful to confirm that this is a backend process.
> With top, hit the 'c' key to show the full path / description of the
> process.
> Backend postgres processes should then have more useful descriptions of
> what
> they are doing and identifying themselves.
> You can also confirm what query is causing that by lining up the process
> id
> from top with the one returned by:
>
> select current_query, procpid from pg_stat_activity where current_query
> not
> like '
> Or by simply using the process id for the where clause (where procpid = ).
>
> How often is the table being queried modified?  Between the startup when
> the
> query is fast, and when it slows down, is there a lot of modification to
> its
> rows?
>
>
> On Fri, Sep 26, 2008 at 5:52 AM, Albe Laurenz
> <[EMAIL PROTECTED]>wrote:
>
>> kiki wrote:
>> > The number of rows returned by the query varies, right now is:
>> >
>> > 49 row(s)
>> > Total runtime: 3,965.718 ms
>> > The table currently has 971582 rows.
>> >
>> > But the problem is that when database server is restarted everything
>> works
>> > fine and fast. No heavy loads of the processor and as time passes
>> > situation with the processor is worsen.
>>
>> It would be interesting to know the result of EXPLAIN ANALYZE for the
>> query, both when it performs well and when it doesn't.
>>
>> One thing I see right away when I look at your postgresql.conf is that
>> you have set shared_buffers to an awfully small value of 2000, when you
>> have
>> enough memory on the machine (vmstat reports 2GB free memory, right?).
>>
>> Does the situation improve if you set it to a higher value?
>>
>> Yours,
>> Laurenz Albe
>>
>> --
>> Sent via pgsql-performance mailing list
>> (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread Harald Armin Massa
Hello Maja,

> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
> LIMIT 1;
>
> (the table is indexed by id_camera, has around 1 milion rows, and this
> query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
> around 4800 ms, and this table is queried a lot although not so often
> queried modified)

700.000 of 1.000.000 rows is around 70% ... that are nearly all rows.
As much as I read you, this table is not often modified. What reason
is there for quering all that data again and again instead of keeping
it in memory (should it be really needed) ?


Harald

-- 
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
EuroPython 2009 will take place in Birmingham - Stay tuned!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread Albe Laurenz
kiki wrote:
> First I have increased shared_buffers from 2000 to 8000. Since the
> postgresql is on Debian I had to increase SHMMAX kernel value.
> Everything is working much faster now.

Good to hear that the problem is gone.

> There is still heavy load of postmaster process (up to 100%) for a simple
> query
> 
> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
> LIMIT 1;
> 
> (the table is indexed by id_camera, has around 1 milion rows, and this
> query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
> around 4800 ms, and this table is queried a lot although not so often
> queried modified)
> 
> but I don't think that is strange behavior of the postgresql.
> And it is exhibited all the time; the postgresql reset does not influence
> it at all.

I'd expect a sequential scan for a query that returns 70% of the table.

But I cannot believe that this query returns more than one row since
it has a "LIMIT 1". Can you enlighten me?

In the above query (with LIMIT 1), maybe an index on "date" could help.

Yours,
Laurenz Albe


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread kiki
Sorry, without LIMIT returns around 70 rows.
Tried to index date column and time column but the performance is pretty
much the same.
Everything is OK, I just don’t understand way is this query burdening the
processor so much.

Regards,
Maja

> kiki wrote:
>> First I have increased shared_buffers from 2000 to 8000. Since the
>> postgresql is on Debian I had to increase SHMMAX kernel value.
>> Everything is working much faster now.
>
> Good to hear that the problem is gone.
>
>> There is still heavy load of postmaster process (up to 100%) for a
>> simple
>> query
>>
>> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
>> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
>> LIMIT 1;
>>
>> (the table is indexed by id_camera, has around 1 milion rows, and this
>> query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
>> around 4800 ms, and this table is queried a lot although not so often
>> queried modified)
>>
>> but I don't think that is strange behavior of the postgresql.
>> And it is exhibited all the time; the postgresql reset does not
>> influence
>> it at all.
>
> I'd expect a sequential scan for a query that returns 70% of the table.
>
> But I cannot believe that this query returns more than one row since
> it has a "LIMIT 1". Can you enlighten me?
>
> In the above query (with LIMIT 1), maybe an index on "date" could help.
>
> Yours,
> Laurenz Albe
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread kiki
Hello Herald,

the queried table is used for communication between server application and
web user interface.
When application detects an event it writes it down in table.
The web client checks every 10 second if something new is written in the
table.
Usually nothing new is written but the client has to check it.
I don't fetch all rows, usually just the last one written.
The speed of the query is not a problem but the strange thing is the
processor load with postmaster when the query is executed.
I don’t now how to reduce processor load.
Should I change some other settings beside shared_buffers like work_mem?
Or maybe such processor load is OK?

Regards,
Maja

> Hello Maja,
>
>> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
>> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
>> LIMIT 1;
>>
>> (the table is indexed by id_camera, has around 1 milion rows, and this
>> query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
>> around 4800 ms, and this table is queried a lot although not so often
>> queried modified)
>
> 700.000 of 1.000.000 rows is around 70% ... that are nearly all rows.
> As much as I read you, this table is not often modified. What reason
> is there for quering all that data again and again instead of keeping
> it in memory (should it be really needed) ?
>
>
> Harald
>
> --
> GHUM Harald Massa
> persuadere et programmare
> Harald Armin Massa
> Spielberger Straße 49
> 70435 Stuttgart
> 0173/9409607
> no fx, no carrier pigeon
> -
> EuroPython 2009 will take place in Birmingham - Stay tuned!
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread hubert depesz lubaczewski
On Mon, Sep 29, 2008 at 10:29:45AM +0200, [EMAIL PROTECTED] wrote:
> >> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
> >> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
> >> LIMIT 1;
> Sorry, without LIMIT returns around 70 rows.
> Tried to index date column and time column but the performance is pretty
> much the same.
> Everything is OK, I just don’t understand way is this query burdening the
> processor so much.

1. please do not top-post.
2. for this query, you can use this index:
create index xxx on system_alarm (id_camera, date, time) where confirmed = 
'false' and dismissed = 'false';
or you can make it without where:
create index xxx on system_alarm (id_camera, confirmed, dismissed, date, time);
but if you usually have the criteria "confirmed = 'false' and dismissed
= 'false'" then the first index should be faster.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread Albe Laurenz
Please try to avoid top-posting where inappropriate.

kiki wrote:
>>> There is still heavy load of postmaster process (up to 100%) for a simple
>>> query
>>>
>>> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE id_camera='3' AND
>>> confirmed='false' AND dismissed='false' ORDER BY date DESC, time DESC
>>> LIMIT 1;
>>>
>>> (the table is indexed by id_camera, has around 1 milion rows, and this
>>> query returns around 70 rows and is executed (EXPLAIN ANALYSE) in
>>> around 4800 ms, and this table is queried a lot although not so often
>>> queried modified)
>>>
>>> but I don't think that is strange behavior of the postgresql.
>>> And it is exhibited all the time; the postgresql reset does not
>>> influence it at all.
>>
>> I'd expect a sequential scan for a query that returns 70% of the table.
>>
>> But I cannot believe that this query returns more than one row since
>> it has a "LIMIT 1". Can you enlighten me?
>>
>> In the above query (with LIMIT 1), maybe an index on "date" could help.
>
> Sorry, without LIMIT returns around 70 rows.
> Tried to index date column and time column but the performance is pretty
> much the same.
> Everything is OK, I just don't understand way is this query burdening the
> processor so much.

Yes, for the query without the LIMIT clause I wouldn't expect any gain from
indexing.

Probably the CPU load is caused by the sorting.
Does it look different if you omit ORDER BY?
Maybe the sort will perform better if you increase work_mem in postgresql.conf,
you could experiment with that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread Albe Laurenz
kiki wrote:
> The speed of the query is not a problem but the strange thing is the
> processor load with postmaster when the query is executed.
> I don’t now how to reduce processor load.

Did you try without the ORDER BY?
Where are the execution plans?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread kiki
> kiki wrote:
>> The speed of the query is not a problem but the strange thing is the
>> processor load with postmaster when the query is executed.
>> I don’t now how to reduce processor load.
>
> Did you try without the ORDER BY?
> Where are the execution plans?
>
> Yours,
> Laurenz Albe
>

I expanded work_mem to 256 Mb and created index on table

create index xxx on system_alarm (id_camera, date, time) where confirmed =
'false' and dismissed = 'false';

the processor load now executing the query is max. 70%

the query execution with and without order is:

istra_system=> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE
id_camera='3' AND confirmed='false' AND dismissed='false' ;

 Seq Scan on system_alarm  (cost=0.00..24468.33 rows=735284 width=47)
(actual time=90.792..1021.967 rows=724846 loops=1)
   Filter: ((id_camera = 3) AND (NOT confirmed) AND (NOT dismissed))
 Total runtime: 1259.426 ms
(3 rows)

istra_system=> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE
id_camera='3' AND confirmed='false' AND dismissed='false' ORDER BY date
DESC, time ;

 Sort  (cost=96114.18..97952.39 rows=735284 width=47) (actual
time=2303.547..2602.116 rows=724846 loops=1)
   Sort Key: date, "time"
   ->  Seq Scan on system_alarm  (cost=0.00..24468.33 rows=735284
width=47) (actual time=100.322..1115.837 rows=724846 loops=1)
 Filter: ((id_camera = 3) AND (NOT confirmed) AND (NOT dismissed))
 Total runtime: 2916.557 ms
(5 rows)

I think this is OK.
Thanx


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] CPU load

2008-09-29 Thread Albe Laurenz
kiki wrote:
> I expanded work_mem to 256 Mb and created index on table
> 
> create index xxx on system_alarm (id_camera, date, time) where confirmed =
> 'false' and dismissed = 'false';

That index is not used for the query (as could be expected).
You better remove it.

> the processor load now executing the query is max. 70%
> 
> the query execution with and without order is:
> 
> istra_system=> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE
> id_camera='3' AND confirmed='false' AND dismissed='false' ;
> 
>  Seq Scan on system_alarm  (cost=0.00..24468.33 rows=735284 width=47) (actual 
> time=90.792..1021.967 rows=724846 loops=1)
>Filter: ((id_camera = 3) AND (NOT confirmed) AND (NOT dismissed))
>  Total runtime: 1259.426 ms
> (3 rows)
> 
> istra_system=> EXPLAIN ANALYSE  SELECT * FROM system_alarm WHERE
> id_camera='3' AND confirmed='false' AND dismissed='false' ORDER BY date
> DESC, time ;
> 
>  Sort  (cost=96114.18..97952.39 rows=735284 width=47) (actual 
> time=2303.547..2602.116 rows=724846 loops=1)
>Sort Key: date, "time"
>->  Seq Scan on system_alarm  (cost=0.00..24468.33 rows=735284 width=47) 
> (actual time=100.322..1115.837 rows=724846 loops=1)
>  Filter: ((id_camera = 3) AND (NOT confirmed) AND (NOT dismissed))
>  Total runtime: 2916.557 ms
> (5 rows)
> 
> I think this is OK.

I think so too.
I would say it is OK for the query to use much CPU during sort as long as this
does not last for too long.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] dedicated server & postgresql 8.1 conf tunning

2008-09-29 Thread paul
Hello

I'm running pgsql 8.1.11 (from debian stable) on a server with 16GB RAM
(Linux helios 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64
GNU/Linux).
I have a table "tickets" with 1 000 000 insert by month ( ~2600 each 2hours
) (for the moment 1300 rows for 5GB )
and i have to extract statistics ( number of calls, number of calls less
than X seconds, number of news calles, number of calls from the new
callers, ...)



1°) The server will handle max 15 queries at a time.
So this is my postgresql.conf

max_connections = 15
shared_buffers =  995600 # ~1Go
temp_buffers = 1000
work_mem = 512000 # ~512Ko
maintenance_work_mem = 1048576 # 1Mo

max_fsm_pages = 41522880 # ~40Mo
max_fsm_relations = 8000 
checkpoint_segments = 10
checkpoint_timeout = 3600
effective_cache_size = 13958643712 # 13Go

stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
autovacuum = off

How can i optimize the configuration?




2°) My queries look like
SELECT tday AS n,
COUNT(DISTINCT(a.appelant)) AS new_callers,
COUNT(a.appelant) AS new_calls
FROM cirpacks.tickets AS a
WHERE LENGTH(a.appelant) > 4
AND a.service_id IN ( 95, 224, 35, 18 )
AND a.exploitant_id = 66
AND a.tyear = 2008
AND a.tmonth = 08
AND EXISTS ( SELECT 1 FROM cirpacks.clients AS b WHERE b.appelant =
a.appelant AND b.service_id IN ( 95, 224, 35, 18 ) AND b.heberge_id = 66
HAVING to_char(MIN(b.premier_appel), 'MMDD') = to_char(a.date,
'MMDD') )
GROUP BY n
ORDER BY n;

or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
cirpacks.tickets WHERE tyear = ... and tmonth = ... and tday = ... AND
audiotel IN ( '...', '...' );
or select ... SUM( CASE WHEN condition THEN value ELSE 0) ... FROM
cirpacks.tickets WHERE '2007-01-01' <= date AND date <= '2008-08-31' AND
audiotel IN ( '...', '...' );


which indexes are the best ?
case 0:
index_0_0 (service_id, exploitant_id, palier_id, habillage_id, tweek, tday,
thour, tmonth, tyear, length(appelant::text))
index_0_1 (audiotel, cat, tweek, tday, thour, tmonth, tyear,
length(appelant::text))

or case 1
index_1_0 (audiotel, cat, service_id, exploitant_id, palier_id,
habillage_id, tweek, tday, thour, tmonth, tyear, length(appelant::text))

or case 2:
index_2_0 (tweek, tday, thour, tmonth, tyear, length(appelant::text))
index_2_1 (service_id, exploitant_id, palier_id, habillage_id)
index_2_2 (audiotel, cat)

or even (case 3)
index_3_0 (service_id, exploitant_id, palier_id, habillage_id, tyear,
length(appelant::text))
index_3_1 (service_id, exploitant_id, palier_id, habillage_id, tmonth,
tyear, length(appelant::text))
index_3_2 (service_id, exploitant_id, palier_id, habillage_id, tday,
tmonth, tyear, length(appelant::text))
[...]





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Identical DB's, different execution plans

2008-09-29 Thread Doug Eck
I have two identical databases that run the same query each morning.  Starting 
this morning, something caused the first db to start using a different 
execution plan for the query, resulting in much worse performance.  I've have 
tried several things this morning, but I am currently stumped on what would be 
causing the different execution plans.

The query and the results of the explain analyze on the two db's:

db1=> explain analyze 
select
t1.bn,
t2.mu,
t1.nm,
t1.root,
t1.suffix,
t1.type
from
 t1,
 t2
where
t2.eff_dt = current_date
and t1.active = true
and t1.bn = t2.sn;

The slower plan used on db1:
   QUERY PLAN   
 
-
 Nested Loop  (cost=145.12..38799.61 rows=7876 width=47) (actual 
time=6.494..352.166 rows=8437 loops=1)
   ->  Bitmap Heap Scan on t2  (cost=145.12..19464.74 rows=10898 width=22) 
(actual time=6.472..22.684 rows=12204 loops=1)
 Recheck Cond: (eff_dt = ('now'::text)::date)
 ->  Bitmap Index Scan on t2_nu1  (cost=0.00..142.40 rows=10898 
width=0) (actual time=4.013..4.013 rows=24482 loops=1)
   Index Cond: (eff_dt = ('now'::text)::date)
   ->  Index Scan using t1_uc2 on t1  (cost=0.00..1.76 rows=1 width=32) (actual 
time=0.012..0.026 rows=1 loops=12204)
 Index Cond: ((t1.bn)::text = (t2.sn)::text)
 Filter: active
 Total runtime: 353.629 ms
(9 rows)

Time: 354.795 ms


And the faster plan from db2:


 QUERY 
PLAN 

 Merge Join  (cost=21371.63..21720.78 rows=7270 width=47) (actual 
time=60.412..80.865 rows=8437 loops=1)
   Merge Cond: ("outer"."?column6?" = "inner"."?column3?")
   ->  Sort  (cost=8988.56..9100.55 rows=44794 width=32) (actual 
time=30.685..33.370 rows=8438 loops=1)
 Sort Key: (t1.bn)::text
 ->  Seq Scan on t1  (cost=0.00..5528.00 rows=44794 width=32) (actual 
time=0.008..18.280 rows=8439 loops=1)
   Filter: active
   ->  Sort  (cost=12383.07..12409.32 rows=10500 width=22) (actual 
time=29.718..33.515 rows=12204 loops=1)
 Sort Key: (t2.sn)::text
 ->  Index Scan using t2_nu1 on t2  (cost=0.00..11681.77 rows=10500 
width=22) (actual time=0.052..13.295 rows=12204 loops=1)
   Index Cond: (eff_dt = ('now'::text)::date)
 Total runtime: 83.385 ms
(11 rows)

t2.eff_dt is defined as a date, t1.active is a boolean, all other fields are 
varchar.  Table t1 has a unique index (uc2) on field bn and a second unique 
index (uc3) on fields (root, suffix).  Table t2 has a unique index (uc1) on 
(sn, eff_dt), and a non-unique index (nu1) on eff_dt.

Table t1 has 12204 rows.  Table t2 has 7.1M rows, 12204 of which have eff_dt = 
current_date.  

Both database have autovacuum turned on, and both have been vacuumed and 
analyzed in the last 24 hours.

Any ideas as to what could the first db to opt for the slower subquery rather 
than the merge?

Thanks in advance.



  

Re: [PERFORM] Identical DB's, different execution plans

2008-09-29 Thread Tom Lane
Doug Eck <[EMAIL PROTECTED]> writes:
> Any ideas as to what could the first db to opt for the slower subquery rather 
> than the merge?

Not from the information given.  Presumably db1 thinks that the
mergejoin plan would be slower, but why it thinks that isn't clear yet.
Try setting enable_nestloop = off (and enable_hashjoin = off if it then
wants a hashjoin) and then post the EXPLAIN ANALYZE results.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical DB's, different execution plans

2008-09-29 Thread Doug Eck
Setting enable_nestloop = off did result in a hash join, so I also set 
enable_hashjoin = off.

The new plan from the slower db:
  QUERY PLAN
   
---
 Merge Join  (cost=20195.54..46442.99 rows=7876 width=47) (actual 
time=136.531..478.708 rows=8437 loops=1)
   Merge Cond: ((t1.bn)::text = "inner"."?column3?")
   ->  Index Scan using t1_uc2 on t1  (cost=0.00..25604.74 rows=204906 
width=32) (actual time=0.061..327.285 rows=8438 loops=1)
 Filter: active
   ->  Sort  (cost=20195.54..20222.79 rows=10898 width=22) (actual 
time=136.461..138.621 rows=12204 loops=1)
 Sort Key: (t2.sn)::text
 ->  Bitmap Heap Scan on t2  (cost=145.12..19464.74 rows=10898 
width=22) (actual time=7.580..120.144 rows=12204 loops=1)
   Recheck Cond: (eff_dt = ('now'::text)::date)
   ->  Bitmap Index Scan on t2_nu1  (cost=0.00..142.40 rows=10898 
width=0) (actual time=4.964..4.964 rows=24483 loops=1)
 Index Cond: (eff_dt = ('now'::text)::date)
 Total runtime: 480.344 ms
(11 rows)

And the faster one:

 QUERY 
PLAN 

 Merge Join  (cost=21371.63..21720.78 rows=7270 width=47) (actual 
time=60.435..80.604 rows=8437 loops=1)
   Merge Cond: ("outer"."?column6?" = "inner"."?column3?")
   ->  Sort  (cost=8988.56..9100.55 rows=44794 width=32) (actual 
time=30.498..33.093 rows=8438 loops=1)
 Sort Key: (t1.bn)::text
 ->  Seq Scan on t1  (cost=0.00..5528.00 rows=44794 width=32) (actual 
time=0.010..17.950 rows=8439 loops=1)
   Filter: active
   ->  Sort  (cost=12383.07..12409.32 rows=10500 width=22) (actual 
time=29.928..33.658 rows=12204 loops=1)
 Sort Key: (t2.sn)::text
 ->  Index Scan using t2_nu1 on t2  (cost=0.00..11681.77 rows=10500 
width=22) (actual time=0.062..13.356 rows=12204 loops=1)
   Index Cond: (eff_dt = ('now'::text)::date)
 Total runtime: 83.054 ms
(11 rows)

And the query again:

explain analyze 
select
t1.bn,
t2.mu,
t1.nm,
t1.root,
t1.suffix,
t1.type
from
 t1,
 t2
where
t2.eff_dt = current_date
and t1.active = true
and t1.bn = t2.sn;

Thanks.



- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: Doug Eck <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Sent: Monday, September 29, 2008 11:42:01 AM
Subject: Re: [PERFORM] Identical DB's, different execution plans 

Doug Eck <[EMAIL PROTECTED]> writes:
> Any ideas as to what could the first db to opt for the slower subquery rather 
> than the merge?

Not from the information given.  Presumably db1 thinks that the
mergejoin plan would be slower, but why it thinks that isn't clear yet.
Try setting enable_nestloop = off (and enable_hashjoin = off if it then
wants a hashjoin) and then post the EXPLAIN ANALYZE results.

regards, tom lane



  

Re: [PERFORM] Identical DB's, different execution plans

2008-09-29 Thread Tom Lane
Doug Eck <[EMAIL PROTECTED]> writes:
> The new plan from the slower db:

>->  Index Scan using t1_uc2 on t1  (cost=0.00..25604.74 rows=204906 
> width=32) (actual time=0.061..327.285 rows=8438 loops=1)
>  Filter: active

This seems a bit fishy.  In the first place, with such a simple filter
condition it shouldn't be that far off on the rowcount estimate.  In
the second place, the cost estimate is more than twice what the other
server estimates to do a seqscan and sort of the same data, and the
rowcount estimate is five times as much.  So there's something really
significantly different about the t1 tables in the two cases.

The first thing you ought to do is to look at the pg_class.relpages
and reltuples entries for t1 in both databases.  What I am suspecting is
that for some reason the "slow" db has suffered a lot of bloat in that
table, leading to a corresponding increase in the cost of a seqscan.
If so, a VACUUM FULL or CLUSTER should fix it, though you'll next need
to look into why routine vacuumings weren't happening.  (It looks like
t2 may be a bit bloated as well.)

If that's not it, we'll need to probe deeper ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identical DB's, different execution plans

2008-09-29 Thread Doug Eck
Tom,

You nailed it.  The t1 table was using 9600 relpages versus 410 after the 
vacuum full.  The two databases are now showing similar execution plans and 
times.

Thanks for your help.  It is greatly appreciated.

Doug Eck



- Original Message 
From: Tom Lane <[EMAIL PROTECTED]>
To: Doug Eck <[EMAIL PROTECTED]>
Cc: pgsql-performance@postgresql.org
Sent: Monday, September 29, 2008 6:20:20 PM
Subject: Re: [PERFORM] Identical DB's, different execution plans 

Doug Eck <[EMAIL PROTECTED]> writes:
> The new plan from the slower db:

>->  Index Scan using t1_uc2 on t1  (cost=0.00..25604.74 rows=204906 
> width=32) (actual time=0.061..327.285 rows=8438 loops=1)
>  Filter: active

This seems a bit fishy.  In the first place, with such a simple filter
condition it shouldn't be that far off on the rowcount estimate.  In
the second place, the cost estimate is more than twice what the other
server estimates to do a seqscan and sort of the same data, and the
rowcount estimate is five times as much.  So there's something really
significantly different about the t1 tables in the two cases.

The first thing you ought to do is to look at the pg_class.relpages
and reltuples entries for t1 in both databases.  What I am suspecting is
that for some reason the "slow" db has suffered a lot of bloat in that
table, leading to a corresponding increase in the cost of a seqscan.
If so, a VACUUM FULL or CLUSTER should fix it, though you'll next need
to look into why routine vacuumings weren't happening.  (It looks like
t2 may be a bit bloated as well.)

If that's not it, we'll need to probe deeper ...

regards, tom lane