[PERFORM] Substring search using "exists" with a space in the search term

2009-03-03 Thread Hans Liebenberg




Hi,

I have come across a weird bug (i think) in postgres 8.1.11 (possibly
others)

Without going into my table structure detail I will demonstrate the
problem by showing the select statements:

The following statement:
SELECT count(*)
FROM object o, object_version v, object_type ot 
where v.id = o.active_versionid and ot.id = o.object_typeid and
o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) 
and o.is_published = 't' and ot.object_type_typeid <> 1 

and exists (
select ova.object_versionid from attribute_value av,
object_version_attribute ova where ova.attribute_valueid=av.id and
object_versionid = v.id 
and (upper(av.text_val) like '%KIWI%') )


runs fine and executes with success.
BUT now this is the strange bit, if I have a space in my search term
then postgres hangs for an indefinite period: eg:

SELECT count(*)
FROM object o, object_version v, object_type ot 
where v.id = o.active_versionid and ot.id = o.object_typeid and
o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) 
and o.is_published = 't' and ot.object_type_typeid <> 1 

and exists (
select ova.object_versionid from attribute_value av,
object_version_attribute ova where ova.attribute_valueid=av.id and
object_versionid = v.id 
and (upper(av.text_val) like '%KIWI FRUIT%') )


Yet, if I modify the "exists" to an "in" all works well , as follows

SELECT count(*)
FROM object o, object_version v, object_type ot 
where v.id = o.active_versionid and ot.id = o.object_typeid and
o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) 
and o.is_published = 't' and ot.object_type_typeid <> 1 

and v.id in (
select ova.object_versionid from attribute_value av,
object_version_attribute ova where ova.attribute_valueid=av.id 
and (upper(av.text_val) like '%KIWI FRUIT%') )


So my question is why would a space character cause postgres to hang
when using the exists clause

I have tested this on several different servers and mostly get the same
result (v8.08 and v8.1.11) , when I check the execution plan for either
query (space or no space) they are identical.

An upgrade to 8.3 fixes this, but I am still curious as to what could
cause such bizarre behavior.

Thanks
Hans





[PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Sebastjan Trepca
Hey,

I have a table that links content together and it currently holds
about 17 mio records. Typical query is a join with a content table and
link table:

noovo-new=# explain analyze SELECT "core_accessor"."id",
"core_accessor"."content_type_id",
"core_accessor"."object_id", "core_accessor"."ordering",
"core_accessor"."label", "core_accessor"."date_posted",
"core_accessor"."publish_state", "core_accessor"."nooximity_old",
"core_accessor"."rising", "core_accessor"."nooximity",
"core_accessor"."nooximity_old_date_posted",
"core_accessor"."nooximity_date_posted", "core_accessor"."user_id",
"core_accessor"."slot_id", "core_accessor"."slot_type_id",
"core_accessor"."role", "core_base"."object_id",
"core_base"."content_type_id", "core_base"."abstract",
"core_base"."abstract_title", "core_base"."image",
 "core_base"."date_posted", "core_base"."date_modified",
"core_base"."date_expires", "core_base"."publish_state",
 "core_base"."location", "core_base"."location_x",
"core_base"."location_y", "core_base"."raw", "core_base"."author_id",
 "core_base"."excerpt", "core_base"."state_id",
"core_base"."country_id", "core_base"."language",
"core_base"."_identifier",
  "core_base"."slot_url", "core_base"."source_id",
"core_base"."source_content_type_id", "core_base"."source_type",
 "core_base"."source_value", "core_base"."source_title",
"core_base"."direct_to_source", "core_base"."comment_count",
 "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS
core_base ON core_base.content_type_id =
 core_accessor.content_type_id AND core_base.object_id =
core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119
 AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E''
AND "core_accessor"."publish_state" >= 60 AND
 "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0))
order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5
;

  QUERY PLAN
-
 Limit  (cost=31930.65..31930.66 rows=5 width=860) (actual
time=711.924..711.927 rows=5 loops=1)
   ->  Sort  (cost=31930.65..31937.80 rows=2861 width=860) (actual
time=711.923..711.923 rows=5 loops=1)
 Sort Key: core_accessor.date_posted, core_accessor.nooximity
 Sort Method:  top-N heapsort  Memory: 31kB
 ->  Nested Loop  (cost=0.00..31883.13 rows=2861 width=860)
(actual time=0.089..543.497 rows=68505 loops=1)
   ->  Index Scan using core_accessor_fresh_idx on
core_accessor  (cost=0.00..5460.07 rows=2970 width=92) (actual
time=0.068..54.921 rows=69312 loops=1)
 Index Cond: ((slot_id = 472) AND (slot_type_id =
119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
(publish_state >= 60))
   ->  Index Scan using core_base_pkey on core_base
(cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1
loops=69312)
 Index Cond: ((core_base.object_id =
core_accessor.object_id) AND (core_base.content_type_id =
core_accessor.content_type_id))
 Total runtime: 712.031 ms
(10 rows)

noovo-new=# select * from pg_stat_user_tables where relname='core_accessor';
 relid | schemaname |relname| seq_scan | seq_tup_read |
idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
n_tup_hot_upd | n_live_tup | n_dead_tup |  last_vacuum
 | last_autovacuum | last_analyze  | last_autoanalyze
---++---+--+--+--+---+---+---+---+---+++---+-+---+--
 51159 | public | core_accessor |   58 |749773516 |
13785608 | 149165183 |  9566 |   548 |   347 |
  206 |   17144303 |251 | 2009-03-03 07:02:19.733778-06 |
   | 2009-03-03 06:17:47.784268-06 |
(1 row)

noovo-new=# \d+ core_accessor;
  Table "public.core_accessor"
  Column   |   Type   |
 Modifiers  | Description
---+--++-
 id| bigint   | not null
default nextval('core_accessor_id_seq'::regclass) |
 flavor| character varying(32)|
|
 content_type_id   | integer  | not null
|
 object_id | integer  | not null
|
 publish_state | smallint | not null
|
 date_posted   | timestamp with time zone | not null
   

Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Robert Haas
On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca  wrote:
> Hey,
>
> I have a table that links content together and it currently holds
> about 17 mio records. Typical query is a join with a content table and
> link table:
>
> noovo-new=# explain analyze SELECT "core_accessor"."id",
> "core_accessor"."content_type_id",
> "core_accessor"."object_id", "core_accessor"."ordering",
> "core_accessor"."label", "core_accessor"."date_posted",
> "core_accessor"."publish_state", "core_accessor"."nooximity_old",
> "core_accessor"."rising", "core_accessor"."nooximity",
> "core_accessor"."nooximity_old_date_posted",
> "core_accessor"."nooximity_date_posted", "core_accessor"."user_id",
> "core_accessor"."slot_id", "core_accessor"."slot_type_id",
> "core_accessor"."role", "core_base"."object_id",
> "core_base"."content_type_id", "core_base"."abstract",
> "core_base"."abstract_title", "core_base"."image",
>  "core_base"."date_posted", "core_base"."date_modified",
> "core_base"."date_expires", "core_base"."publish_state",
>  "core_base"."location", "core_base"."location_x",
> "core_base"."location_y", "core_base"."raw", "core_base"."author_id",
>  "core_base"."excerpt", "core_base"."state_id",
> "core_base"."country_id", "core_base"."language",
> "core_base"."_identifier",
>  "core_base"."slot_url", "core_base"."source_id",
> "core_base"."source_content_type_id", "core_base"."source_type",
>  "core_base"."source_value", "core_base"."source_title",
> "core_base"."direct_to_source", "core_base"."comment_count",
>  "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS
> core_base ON core_base.content_type_id =
>  core_accessor.content_type_id AND core_base.object_id =
> core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119
>  AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E''
> AND "core_accessor"."publish_state" >= 60 AND
>  "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0))
> order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5
> ;
>
>      QUERY PLAN
> -
>  Limit  (cost=31930.65..31930.66 rows=5 width=860) (actual
> time=711.924..711.927 rows=5 loops=1)
>   ->  Sort  (cost=31930.65..31937.80 rows=2861 width=860) (actual
> time=711.923..711.923 rows=5 loops=1)
>         Sort Key: core_accessor.date_posted, core_accessor.nooximity
>         Sort Method:  top-N heapsort  Memory: 31kB
>         ->  Nested Loop  (cost=0.00..31883.13 rows=2861 width=860)
> (actual time=0.089..543.497 rows=68505 loops=1)
>               ->  Index Scan using core_accessor_fresh_idx on
> core_accessor  (cost=0.00..5460.07 rows=2970 width=92) (actual
> time=0.068..54.921 rows=69312 loops=1)
>                     Index Cond: ((slot_id = 472) AND (slot_type_id =
> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
> (publish_state >= 60))
>               ->  Index Scan using core_base_pkey on core_base
> (cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1
> loops=69312)
>                     Index Cond: ((core_base.object_id =
> core_accessor.object_id) AND (core_base.content_type_id =
> core_accessor.content_type_id))
>  Total runtime: 712.031 ms
> (10 rows)
>
> noovo-new=# select * from pg_stat_user_tables where relname='core_accessor';
>  relid | schemaname |    relname    | seq_scan | seq_tup_read |
> idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
> n_tup_hot_upd | n_live_tup | n_dead_tup |          last_vacuum
>  | last_autovacuum |         last_analyze          | last_autoanalyze
> ---++---+--+--+--+---+---+---+---+---+++---+-+---+--
>  51159 | public     | core_accessor |       58 |    749773516 |
> 13785608 |     149165183 |      9566 |       548 |       347 |
>  206 |   17144303 |        251 | 2009-03-03 07:02:19.733778-06 |
>           | 2009-03-03 06:17:47.784268-06 |
> (1 row)
>
> noovo-new=# \d+ core_accessor;
>                                                  Table "public.core_accessor"
>          Column           |           Type           |
>         Modifiers                          | Description
> ---+--++-
>  id                        | bigint                   | not null
> default nextval('core_accessor_id_seq'::regclass) |
>  flavor                    | character varying(32)    |
>                                            |
>  content_type_id           | integer                  | not null
>                                            |
>  object_id                 | integer                  | not null
>                

Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Sebastjan Trepca
But it's already attached in the first mail or am I missing something?

If you don't see it, check this: http://pastebin.com/d71b996d0

Sebastjan



On Tue, Mar 3, 2009 at 6:12 PM, Robert Haas  wrote:
> On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca  wrote:
>> Hey,
>>
>> I have a table that links content together and it currently holds
>> about 17 mio records. Typical query is a join with a content table and
>> link table:
>>
>> noovo-new=# explain analyze SELECT "core_accessor"."id",
>> "core_accessor"."content_type_id",
>> "core_accessor"."object_id", "core_accessor"."ordering",
>> "core_accessor"."label", "core_accessor"."date_posted",
>> "core_accessor"."publish_state", "core_accessor"."nooximity_old",
>> "core_accessor"."rising", "core_accessor"."nooximity",
>> "core_accessor"."nooximity_old_date_posted",
>> "core_accessor"."nooximity_date_posted", "core_accessor"."user_id",
>> "core_accessor"."slot_id", "core_accessor"."slot_type_id",
>> "core_accessor"."role", "core_base"."object_id",
>> "core_base"."content_type_id", "core_base"."abstract",
>> "core_base"."abstract_title", "core_base"."image",
>>  "core_base"."date_posted", "core_base"."date_modified",
>> "core_base"."date_expires", "core_base"."publish_state",
>>  "core_base"."location", "core_base"."location_x",
>> "core_base"."location_y", "core_base"."raw", "core_base"."author_id",
>>  "core_base"."excerpt", "core_base"."state_id",
>> "core_base"."country_id", "core_base"."language",
>> "core_base"."_identifier",
>>  "core_base"."slot_url", "core_base"."source_id",
>> "core_base"."source_content_type_id", "core_base"."source_type",
>>  "core_base"."source_value", "core_base"."source_title",
>> "core_base"."direct_to_source", "core_base"."comment_count",
>>  "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS
>> core_base ON core_base.content_type_id =
>>  core_accessor.content_type_id AND core_base.object_id =
>> core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119
>>  AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E''
>> AND "core_accessor"."publish_state" >= 60 AND
>>  "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0))
>> order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5
>> ;
>>
>>      QUERY PLAN
>> -
>>  Limit  (cost=31930.65..31930.66 rows=5 width=860) (actual
>> time=711.924..711.927 rows=5 loops=1)
>>   ->  Sort  (cost=31930.65..31937.80 rows=2861 width=860) (actual
>> time=711.923..711.923 rows=5 loops=1)
>>         Sort Key: core_accessor.date_posted, core_accessor.nooximity
>>         Sort Method:  top-N heapsort  Memory: 31kB
>>         ->  Nested Loop  (cost=0.00..31883.13 rows=2861 width=860)
>> (actual time=0.089..543.497 rows=68505 loops=1)
>>               ->  Index Scan using core_accessor_fresh_idx on
>> core_accessor  (cost=0.00..5460.07 rows=2970 width=92) (actual
>> time=0.068..54.921 rows=69312 loops=1)
>>                     Index Cond: ((slot_id = 472) AND (slot_type_id =
>> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
>> (publish_state >= 60))
>>               ->  Index Scan using core_base_pkey on core_base
>> (cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1
>> loops=69312)
>>                     Index Cond: ((core_base.object_id =
>> core_accessor.object_id) AND (core_base.content_type_id =
>> core_accessor.content_type_id))
>>  Total runtime: 712.031 ms
>> (10 rows)
>>
>> noovo-new=# select * from pg_stat_user_tables where relname='core_accessor';
>>  relid | schemaname |    relname    | seq_scan | seq_tup_read |
>> idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del |
>> n_tup_hot_upd | n_live_tup | n_dead_tup |          last_vacuum
>>  | last_autovacuum |         last_analyze          | last_autoanalyze
>> ---++---+--+--+--+---+---+---+---+---+++---+-+---+--
>>  51159 | public     | core_accessor |       58 |    749773516 |
>> 13785608 |     149165183 |      9566 |       548 |       347 |
>>  206 |   17144303 |        251 | 2009-03-03 07:02:19.733778-06 |
>>           | 2009-03-03 06:17:47.784268-06 |
>> (1 row)
>>
>> noovo-new=# \d+ core_accessor;
>>                                                  Table "public.core_accessor"
>>          Column           |           Type           |
>>         Modifiers                          | Description
>> ---+--++-
>>  id                        | bigint                   | not null
>> default nextval('core_accessor_id_seq'::regclass) |
>>  flavor       

Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-03 Thread Aaron Guyon
On Mon, Mar 2, 2009 at 10:23 PM, Tom Lane  wrote:

> Are you sure you are comparing apples to apples here?  Same configure
> options for the builds, same parameter values in postgresql.conf, both
> databases ANALYZEd, etc?  And are they running on the same hardware?
>

Thank you for looking at this Tom.  Yes, we have made sure we are comparing
apples to apples here.  The postgresql.confs are identical, as are the
configure flags:
--disable-debug --enable-shared --enable-thread-safety --with-perl
--with-pam --without-docdir --without-tcl --without-python --without-krb5
--without-ldap --without-bonjour --enable-integer-datetimes
--prefix=/opt/postgresql

However, the db was not analyzed.  I'll attached the new explain analyze of
the queries with the db analyzed, but 8.2 still beats 8.3.

The tests are both being run on the same machine, a Quad-core AMD Opteron
Processor 2212
(each with 1024 KB cache) and 4GB of RAM.

I find it telling that the query plan differs so much between postgres 8.2.
and
8.3.  For example, why does the 8.3. planner choose to perform so many seq
scans?  I know seq scans are faster than index scans for small tables, but
these tables have 60K+ rows... surely an index scan would have been a better
choice here?  If you look at the 8.2. query plan, it is very clean in
comparison, index scans all the way through.  I can't help but think the 8.3
planner is simply failing to make the right choices in our case. Another
question would be, why are there so many hash joins in the 8.3 plan now?
All
our indexes are btrees...

Any light that can be shed on what going on with the 8.3. planner would be
much
appreciated.  Thanks in advance.


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread David Wilson
On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca  wrote:

>         ->  Nested Loop  (cost=0.00..31883.13 rows=2861 width=860)
> (actual time=0.089..543.497 rows=68505 loops=1)
>               ->  Index Scan using core_accessor_fresh_idx on
> core_accessor  (cost=0.00..5460.07 rows=2970 width=92) (actual
> time=0.068..54.921 rows=69312 loops=1)
>                     Index Cond: ((slot_id = 472) AND (slot_type_id =
> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
> (publish_state >= 60))

That index scan on core_accessor_fresh_idx has a pretty big disparity
between what the planer expects to get (2970 rows) and what it
actually gets (69312 rows). You should try increasing the statistics
target if you haven't, then re-analyze and try the query again to see
if the planner picks something better. The default of 10 is pretty
small- try 100, or higher.



-- 
- David T. Wilson
david.t.wil...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-03 Thread david

On Tue, 3 Mar 2009, Aaron Guyon wrote:


On Mon, Mar 2, 2009 at 10:23 PM, Tom Lane  wrote:


Are you sure you are comparing apples to apples here?  Same configure
options for the builds, same parameter values in postgresql.conf, both
databases ANALYZEd, etc?  And are they running on the same hardware?



Thank you for looking at this Tom.  Yes, we have made sure we are comparing
apples to apples here.  The postgresql.confs are identical, as are the
configure flags:
--disable-debug --enable-shared --enable-thread-safety --with-perl
--with-pam --without-docdir --without-tcl --without-python --without-krb5
--without-ldap --without-bonjour --enable-integer-datetimes
--prefix=/opt/postgresql

However, the db was not analyzed.  I'll attached the new explain analyze of
the queries with the db analyzed, but 8.2 still beats 8.3.

The tests are both being run on the same machine, a Quad-core AMD Opteron
Processor 2212
(each with 1024 KB cache) and 4GB of RAM.

I find it telling that the query plan differs so much between postgres 8.2.
and
8.3.  For example, why does the 8.3. planner choose to perform so many seq
scans?  I know seq scans are faster than index scans for small tables, but
these tables have 60K+ rows... surely an index scan would have been a better
choice here?  If you look at the 8.2. query plan, it is very clean in
comparison, index scans all the way through.  I can't help but think the 8.3
planner is simply failing to make the right choices in our case. Another
question would be, why are there so many hash joins in the 8.3 plan now?
All
our indexes are btrees...

Any light that can be shed on what going on with the 8.3. planner would be
much
appreciated.  Thanks in advance.


if you haven't done a vaccum analyse on either installation then postgres' 
idea of what sort of data is in the database is unpredictable, and as a 
result it's not surprising that the two systems guess differently about 
what sort of plan is going to be most efficiant.


try doing vaccum analyse on both databases and see what the results are.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Robert Haas
On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca  wrote:
> But it's already attached in the first mail or am I missing something?
>
> If you don't see it, check this: http://pastebin.com/d71b996d0

Woops, sorry, I thought you had sent plain EXPLAIN.  I see it now.

The lowest level at which I see a problem is here:

->  Index Scan using core_accessor_fresh_idx on core_accessor
(cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921
rows=69312 loops=1)
Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label =
''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60))

For some reason it expect 2970 rows but gets 69312.

A good place to start is to change your default_statistics_target
value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Sebastjan Trepca
Set statistics to 1000, reanalyzed and got exactly same results:


noovo-new=# explain analyze SELECT "core_accessor"."id",
"core_accessor"."content_type_id",
"core_accessor"."object_id", "core_accessor"."ordering",
"core_accessor"."label", "core_accessor"."date_posted",
"core_accessor"."publish_state", "core_accessor"."nooximity_old",
"core_accessor"."rising", "core_accessor"."nooximity",
"core_accessor"."nooximity_old_date_posted",
"core_accessor"."nooximity_date_posted", "core_accessor"."user_id",
"core_accessor"."slot_id", "core_accessor"."slot_type_id",
"core_accessor"."role", "core_base"."object_id",
"core_base"."content_type_id", "core_base"."abstract",
"core_base"."abstract_title", "core_base"."image",
 "core_base"."date_posted", "core_base"."date_modified",
"core_base"."date_expires", "core_base"."publish_state",
 "core_base"."location", "core_base"."location_x",
"core_base"."location_y", "core_base"."raw", "core_base"."author_id",
 "core_base"."excerpt", "core_base"."state_id",
"core_base"."country_id", "core_base"."language",
"core_base"."_identifier",
  "core_base"."slot_url", "core_base"."source_id",
"core_base"."source_content_type_id", "core_base"."source_type",
 "core_base"."source_value", "core_base"."source_title",
"core_base"."direct_to_source", "core_base"."comment_count",
 "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS
core_base ON core_base.content_type_id =
 core_accessor.content_type_id AND core_base.object_id =
core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119
 AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E''
AND "core_accessor"."publish_state" >= 60 AND
 "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0))
order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5
;

  QUERY PLAN
-
 Limit  (cost=31716.13..31716.14 rows=5 width=860) (actual
time=711.340..711.343 rows=5 loops=1)
   ->  Sort  (cost=31716.13..31722.19 rows=2424 width=860) (actual
time=711.339..711.339 rows=5 loops=1)
 Sort Key: core_accessor.date_posted, core_accessor.nooximity
 Sort Method:  top-N heapsort  Memory: 31kB
 ->  Nested Loop  (cost=0.00..31675.87 rows=2424 width=860)
(actual time=0.076..544.039 rows=68505 loops=1)
   ->  Index Scan using core_accessor_fresh_idx on
core_accessor  (cost=0.00..9234.77 rows=2511 width=92) (actual
time=0.058..55.225 rows=69312 loops=1)
 Index Cond: ((slot_id = 472) AND (slot_type_id =
119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
(publish_state >= 60))
   ->  Index Scan using core_base_pkey on core_base
(cost=0.00..8.92 rows=1 width=768) (actual time=0.005..0.005 rows=1
loops=69312)
 Index Cond: ((core_base.object_id =
core_accessor.object_id) AND (core_base.content_type_id =
core_accessor.content_type_id))
 Total runtime: 711.443 ms
(10 rows)



This is how I did it:

noovo-new=# alter table core_accessor alter column slot_id set statistics 1000;
ALTER TABLE
noovo-new=# alter table core_accessor alter column slot_type_id set
statistics 1000;
ALTER TABLE
noovo-new=# alter table core_accessor alter column label set statistics 1000;
ALTER TABLE
noovo-new=# alter table core_accessor alter column user_id set statistics 1000;
ALTER TABLE
noovo-new=# alter table core_accessor alter column role set statistics 1000;
ALTER TABLE
noovo-new=# alter table core_accessor alter column publish_state set
statistics 1000;
ALTER TABLE
noovo-new=# analyze core_accessor;
ANALYZE



Sebastjan



On Tue, Mar 3, 2009 at 6:34 PM, David Wilson  wrote:
> On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca  wrote:
>
>>         ->  Nested Loop  (cost=0.00..31883.13 rows=2861 width=860)
>> (actual time=0.089..543.497 rows=68505 loops=1)
>>               ->  Index Scan using core_accessor_fresh_idx on
>> core_accessor  (cost=0.00..5460.07 rows=2970 width=92) (actual
>> time=0.068..54.921 rows=69312 loops=1)
>>                     Index Cond: ((slot_id = 472) AND (slot_type_id =
>> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
>> (publish_state >= 60))
>
> That index scan on core_accessor_fresh_idx has a pretty big disparity
> between what the planer expects to get (2970 rows) and what it
> actually gets (69312 rows). You should try increasing the statistics
> target if you haven't, then re-analyze and try the query again to see
> if the planner picks something better. The default of 10 is pretty
> small- try 100, or higher.
>
>
>
> --
> - David T. Wilson
> david.t.wil...@gmail.com
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-03 Thread Aaron Guyon
On Tue, Mar 3, 2009 at 12:38 PM,  wrote:

> if you haven't done a vaccum analyse on either installation then postgres'
> idea of what sort of data is in the database is unpredictable, and as a
> result it's not surprising that the two systems guess differently about what
> sort of plan is going to be most efficiant.
>
> try doing vaccum analyse on both databases and see what the results are.
>
> David Lang
>

These are the results with vacuum analyze:
8.2.12: 624.366 ms
8.3.3: 1273.601 ms





 QUERY PLAN 




 
-
 Unique  (cost=9832.91..9832.92 rows=2 width=52) (actual time=623.808..623.808 
rows=0 loops=1)
   ->  Sort  (cost=9832.91..9832.92 rows=2 width=52) (actual 
time=623.799..623.799 rows=0 loops=1)
 Sort Key: t8.id
 ->  Nested Loop  (cost=0.00..9832.90 rows=2 width=52) (actual 
time=623.467..623.467 rows=0 loops=1)
   Join Filter: ((t2.not_modified_since >= 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t3.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t6.not_modified_since >= 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t7.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t8.not_modified_since >= 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t9.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD 
HH:MI:SS'::text)) OR (t10.not_modified_since >= 
to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR 
(t11.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, 
'/MM/DD HH:MI:SS'::text)))
   ->  Nested Loop  (cost=0.00..7381.49 rows=598 width=108) (actual 
time=75.634..556.642 rows=1104 loops=1)
 ->  Nested Loop  (cost=0.00..5087.76 rows=602 width=59) 
(actual time=71.744..523.690 rows=1104 loops=1)
   ->  Nested Loop  (cost=0.00..1171.66 rows=165 
width=51) (actual time=66.427..499.798 rows=332 loops=1)
 ->  Nested Loop  (cost=0.00..684.77 rows=117 
width=54) (actual time=38.266..440.024 rows=336 loops=1)
   ->  Nested Loop  (cost=0.00..84.08 
rows=182 width=35) (actual time=16.785..402.046 rows=376 loops=1)
 ->  Nested Loop  (cost=0.00..52.33 
rows=4 width=27) (actual time=11.355..11.547 rows=4 loops=1)
   ->  Index Scan using 
idx_day_part_du on day_part t10  (cost=0.00..8.28 rows=7 width=19) (actual 
time=0.713..0.748 rows=7 loops=1)
 Index Cond: 
(display_unit_id = 250893::numeric)
 Filter: (active <> 
0::numeric)
   ->  Index Scan using 
idx_skin_day_part_id on skin t2  (cost=0.00..6.28 rows=1 width=30) (actual 
time=1.526..1.529 rows=1 loops=7)
 Index Cond: 
(t2.day_part_id = t10.id)
 Filter: (active <> 
0::numeric)
 ->  Index Scan using 
idx_skin_slot_skin_id on skin

Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Sebastjan Trepca
Still the same :/

I raised the default_statistics_target to 600 (it was already 100). I
then restarted pg, ran analyze through all tables and yet there is not
effect.
This is the output for core_accessor:
INFO:  analyzing "public.core_accessor"
INFO:  "core_accessor": scanned 291230 of 291230 pages, containing
17144315 live rows and 0 dead rows; 30 rows in sample, 17144315
estimated total rows

It thinks there are even less rows in the set:

-
 Limit  (cost=30816.49..30816.50 rows=5 width=855) (actual
time=683.907..683.910 rows=5 loops=1)
   ->  Sort  (cost=30816.49..30822.29 rows=2321 width=855) (actual
time=683.906..683.907 rows=5 loops=1)
 Sort Key: core_accessor.date_posted, core_accessor.nooximity
 Sort Method:  top-N heapsort  Memory: 31kB
 ->  Nested Loop  (cost=0.00..30777.94 rows=2321 width=855)
(actual time=0.072..517.970 rows=68505 loops=1)
   ->  Index Scan using core_accessor_fresh_idx on
core_accessor  (cost=0.00..8955.44 rows=2440 width=92) (actual
time=0.056..53.107 rows=69312 loops=1)
 Index Cond: ((slot_id = 472) AND (slot_type_id =
119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
(publish_state >= 60))
   ->  Index Scan using core_base_pkey on core_base
(cost=0.00..8.93 rows=1 width=763) (actual time=0.004..0.005 rows=1
loops=69312)
 Index Cond: ((core_base.object_id =
core_accessor.object_id) AND (core_base.content_type_id =
core_accessor.content_type_id))
 Total runtime: 684.015 ms
(10 rows)





Sebastjan



On Tue, Mar 3, 2009 at 6:40 PM, Robert Haas  wrote:
> On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca  wrote:
>> But it's already attached in the first mail or am I missing something?
>>
>> If you don't see it, check this: http://pastebin.com/d71b996d0
>
> Woops, sorry, I thought you had sent plain EXPLAIN.  I see it now.
>
> The lowest level at which I see a problem is here:
>
> ->  Index Scan using core_accessor_fresh_idx on core_accessor
> (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921
> rows=69312 loops=1)
>    Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label =
> ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60))
>
> For some reason it expect 2970 rows but gets 69312.
>
> A good place to start is to change your default_statistics_target
> value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE.
>
> ...Robert
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Sebastjan Trepca
Maybe this is useful, I removed the JOIN and it uses other
index(core_accessor_date_idx indexes (date_posted, nooximity)), but
its still hardly any better:

noovo-new=# explain analyze SELECT * FROM "core_accessor" WHERE
(("core_accessor"."slot_type_id" = 119
noovo-new(#  AND "core_accessor"."slot_id" = 472 AND
"core_accessor"."label" = E'' AND "core_accessor"."publish_state" >=
60 AND
noovo-new(#  "core_accessor"."role" IN (0) AND
"core_accessor"."user_id" = 0)) ORDER BY "core_accessor"."date_posted"
DESC, "core_accessor"."nooximity" DESC LIMIT 5
noovo-new-# ;

   QUERY PLAN
--
 Limit  (cost=0.00..3709.56 rows=5 width=178) (actual
time=4593.867..4597.587 rows=5 loops=1)
   ->  Index Scan Backward using core_accessor_date_idx on
core_accessor  (cost=0.00..1810265.67 rows=2440 width=178) (actual
time=4593.866..4597.583 rows=5 loops=1)
 Filter: ((publish_state >= 60) AND (slot_type_id = 119) AND
(slot_id = 472) AND (label = ''::text) AND (role = 0) AND (user_id =
0))
 Total runtime: 4597.632 ms
(4 rows)


Sebastjan



On Tue, Mar 3, 2009 at 8:05 PM, Sebastjan Trepca  wrote:
> Still the same :/
>
> I raised the default_statistics_target to 600 (it was already 100). I
> then restarted pg, ran analyze through all tables and yet there is not
> effect.
> This is the output for core_accessor:
> INFO:  analyzing "public.core_accessor"
> INFO:  "core_accessor": scanned 291230 of 291230 pages, containing
> 17144315 live rows and 0 dead rows; 30 rows in sample, 17144315
> estimated total rows
>
> It thinks there are even less rows in the set:
>
> -
>  Limit  (cost=30816.49..30816.50 rows=5 width=855) (actual
> time=683.907..683.910 rows=5 loops=1)
>   ->  Sort  (cost=30816.49..30822.29 rows=2321 width=855) (actual
> time=683.906..683.907 rows=5 loops=1)
>         Sort Key: core_accessor.date_posted, core_accessor.nooximity
>         Sort Method:  top-N heapsort  Memory: 31kB
>         ->  Nested Loop  (cost=0.00..30777.94 rows=2321 width=855)
> (actual time=0.072..517.970 rows=68505 loops=1)
>               ->  Index Scan using core_accessor_fresh_idx on
> core_accessor  (cost=0.00..8955.44 rows=2440 width=92) (actual
> time=0.056..53.107 rows=69312 loops=1)
>                     Index Cond: ((slot_id = 472) AND (slot_type_id =
> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
> (publish_state >= 60))
>               ->  Index Scan using core_base_pkey on core_base
> (cost=0.00..8.93 rows=1 width=763) (actual time=0.004..0.005 rows=1
> loops=69312)
>                     Index Cond: ((core_base.object_id =
> core_accessor.object_id) AND (core_base.content_type_id =
> core_accessor.content_type_id))
>  Total runtime: 684.015 ms
> (10 rows)
>
>
>
>
>
> Sebastjan
>
>
>
> On Tue, Mar 3, 2009 at 6:40 PM, Robert Haas  wrote:
>> On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca  wrote:
>>> But it's already attached in the first mail or am I missing something?
>>>
>>> If you don't see it, check this: http://pastebin.com/d71b996d0
>>
>> Woops, sorry, I thought you had sent plain EXPLAIN.  I see it now.
>>
>> The lowest level at which I see a problem is here:
>>
>> ->  Index Scan using core_accessor_fresh_idx on core_accessor
>> (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921
>> rows=69312 loops=1)
>>    Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label =
>> ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60))
>>
>> For some reason it expect 2970 rows but gets 69312.
>>
>> A good place to start is to change your default_statistics_target
>> value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE.
>>
>> ...Robert
>>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Robert Haas
On Tue, Mar 3, 2009 at 2:16 PM, Sebastjan Trepca  wrote:
> Maybe this is useful, I removed the JOIN and it uses other
> index(core_accessor_date_idx indexes (date_posted, nooximity)), but
> its still hardly any better:
>
> noovo-new=# explain analyze SELECT * FROM "core_accessor" WHERE
> (("core_accessor"."slot_type_id" = 119
> noovo-new(#  AND "core_accessor"."slot_id" = 472 AND
> "core_accessor"."label" = E'' AND "core_accessor"."publish_state" >=
> 60 AND
> noovo-new(#  "core_accessor"."role" IN (0) AND
> "core_accessor"."user_id" = 0)) ORDER BY "core_accessor"."date_posted"
> DESC, "core_accessor"."nooximity" DESC LIMIT 5
> noovo-new-# ;
>
>       QUERY PLAN
> --
>  Limit  (cost=0.00..3709.56 rows=5 width=178) (actual
> time=4593.867..4597.587 rows=5 loops=1)
>   ->  Index Scan Backward using core_accessor_date_idx on
> core_accessor  (cost=0.00..1810265.67 rows=2440 width=178) (actual
> time=4593.866..4597.583 rows=5 loops=1)
>         Filter: ((publish_state >= 60) AND (slot_type_id = 119) AND
> (slot_id = 472) AND (label = ''::text) AND (role = 0) AND (user_id =
> 0))
>  Total runtime: 4597.632 ms
> (4 rows)
>
>
> Sebastjan

Well, in that case, you are being bitten by the fact that our
multi-column selectivity estimates are not very good.  The planner has
good information on how each column behaves in isolation, but not how
they act together.  I've found this to be a very difficult problem to
fix.

Which of the parameters in this query vary and which ones are
typically always the same?  Sometimes you can improve things by
creating an appropriate partial index.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problems with ordering (can't force query planner to use an index)

2009-03-03 Thread Tom Lane
Sebastjan Trepca  writes:
> It thinks there are even less rows in the set:

>->  Index Scan using core_accessor_fresh_idx on
> core_accessor  (cost=0.00..8955.44 rows=2440 width=92) (actual
> time=0.056..53.107 rows=69312 loops=1)
>  Index Cond: ((slot_id = 472) AND (slot_type_id =
> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND
> (publish_state >= 60))

Maybe you should get rid of this six-column index, if you'd rather the
query didn't use it.  It seems a tad overspecialized anyway.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres 8.3, four times slower queries?

2009-03-03 Thread Tom Lane
Aaron Guyon  writes:
> I find it telling that the query plan differs so much between postgres 8.2.

Well, you haven't shown us either the query or the table definitions,
so we're just guessing in the dark.  However, the occurrences of
"::numeric" in the query plan make me wonder whether all of your join
keys are numeric type.  If so, the reason 8.2 didn't use any hash joins
is that it couldn't --- it didn't have a hash method for numerics.  8.3
does and therefore has more flexibility of plan choice.  Comparisons on
numerics aren't terribly fast though (in either release).  I wonder
whether you could change the key columns to int or bigint.

I also find it a tad fishy that both releases are choosing *exactly* the
same join order when there is hardly anything else that is identical
about the plans --- given the cross-release variance in rowcount
estimates etc I'd have expected at least one difference.  Are you doing
something to force the join order, like running with a small
join_collapse_limit setting?  If so maybe you shouldn't.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] work_mem in high transaction rate database

2009-03-03 Thread Flavio Henrique Araque Gurgel

Hello all 

In a dedicated server with 16 cores and 16GB of RAM running PostgreSQL 8.2.5 we 
have a database with basically two kinds of transactions: 
- short transactions with a couple of updates and inserts that runs all the 
day; 
- batch data loads with hundreds of inserts that runs several times a day; 
- one delete for thousands of lines after each batch; 
- selects are made when users need reports, low concurrency here. 

Today the max_connections is ~2500 where the application is a cluster of JBoss 
servers with a pool a bit smaller then this total. 
work_mem = 1GB 
maintenance_work_mem = 1GB 
shared_buffers = 4GB 

autovacuum takes a lot of time running in the largest tables (3 large tables in 
50) causing some connections to have to wait for it to finish to start 
transactioning again. 

I see a few processes (connections) using 10 ~ 20% of total system memory and 
the others using no more then 1%. 

What I want to ask is: is it better to keep the work_mem as high as it is today 
or is it a safe bet triyng to reduce this number, for example, to 1 or 2MB so I 
can keep the distribution of memory more balanced among all connections? 

Thanks! 

Flavio Henrique A. Gurgel 



Re: [PERFORM] work_mem in high transaction rate database

2009-03-03 Thread Scott Marlowe
On Tue, Mar 3, 2009 at 5:28 PM, Flavio Henrique Araque Gurgel
 wrote:
> Hello all
>
> In a dedicated server with 16 cores and 16GB of RAM running PostgreSQL 8.2.5
> we have a database with basically two kinds of transactions:
> - short transactions with a couple of updates and inserts that runs all the
> day;
> - batch data loads with hundreds of inserts that runs several times a day;
> - one delete for thousands of lines after each batch;
> - selects are made when users need reports, low concurrency here.
>
> Today the max_connections is ~2500 where the application is a cluster of
> JBoss servers with a pool a bit smaller then this total.
> work_mem = 1GB
> maintenance_work_mem = 1GB
> shared_buffers = 4GB

Oh my lord, that is a foot gun waiting to go off.  Assuming 2k
connections, and somehow a fair number of them went active with big
sorts, you'd be able to exhaust all physical memory  with about 8 to
16 connections.  Lower work_mem now. To something like 1 to 4 Meg.  Do
not pass go.  If some oddball query really needs a lot of work_mem,
and benchmarks show something larger work_mem helps, consider raising
the work_mem setting for that one query to something under 1G (way
under 1G) That makes it noticeably faster.  Don't allocate more than a
test shows you helps.

> autovacuum takes a lot of time running in the largest tables (3 large tables
> in 50) causing some connections to have to wait for it to finish to start
> transactioning again.

Vacuum does not block transactions.  unless you're dropping tables or something.

> I see a few processes (connections) using 10 ~ 20% of total system memory
> and the others using no more then 1%.

This is commonly misread.  It has to do with the vagaries of shared
memory allocation and accounting.  The numbers quite likely don't mean
what you think they mean.  Post the first 20 or so lines from top to
show us.

> What I want to ask is: is it better to keep the work_mem as high as it is
> today or is it a safe bet triyng to reduce this number, for example, to 1 or
> 2MB so I can keep the distribution of memory more balanced among all
> connections?

You're work_mem is dangerously high.  Your current reading of top may
not actually support lowering it directly.  Since you've got 4G
shared_buffers allocated, any process that's touched all or most of
shared_buffer memory will show as using 4G of ram.  That's why you
should post output of top, or google on linux virtual memory and top
and what the numbers mean.

Let's say that 1% of your queries can benefit from > 100Meg work_mem,
and 5% with 60M, and 10% with 40M, and 20% with 20M, and 30% with 16M,
and 50% 8M and 4M is enough for all the else to do well.

If, somehow, 100 queries fired off that could use > 100Meg, they
might, with your current settings use all your memory and start using
swap til swap ran out and they started getting out of memory errors
and failing.  This would affect all the other queries on the machine
as well.

OTOH, if you had work_mem limited to 16M, and 100 of those same
queries fired off, they'd individually run a little slower, but they
wouldn't be able to run the machine out of memory.

If your work_mem and max_connections multiplied is > than some
fraction of memory you're doing it wrong, and setting your machine up
for mysterious, heavy load failures, the worst kind.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] work_mem in high transaction rate database

2009-03-03 Thread Akos Gabriel
Tue, 3 Mar 2009 18:37:42 -0700 -n
Scott Marlowe  írta:


> Oh my lord, that is a foot gun waiting to go off.  Assuming 2k
> connections, and somehow a fair number of them went active with big

I absolutely agree with Scott. Plus set effective_cache_size
accordingly, this would help the planner. You can read a lot about
setting this in the mailing list archives.

-- 
Üdvözlettel,
Gábriel Ákos
-=E-Mail :akos.gabr...@i-logic.hu|Web:  http://www.i-logic.hu=-
-=Tel/fax:+3612367353|Mobil:+36209278894 =-

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance