[PERFORM] Substring search using "exists" with a space in the search term
Hi, I have come across a weird bug (i think) in postgres 8.1.11 (possibly others) Without going into my table structure detail I will demonstrate the problem by showing the select statements: The following statement: SELECT count(*) FROM object o, object_version v, object_type ot where v.id = o.active_versionid and ot.id = o.object_typeid and o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) and o.is_published = 't' and ot.object_type_typeid <> 1 and exists ( select ova.object_versionid from attribute_value av, object_version_attribute ova where ova.attribute_valueid=av.id and object_versionid = v.id and (upper(av.text_val) like '%KIWI%') ) runs fine and executes with success. BUT now this is the strange bit, if I have a space in my search term then postgres hangs for an indefinite period: eg: SELECT count(*) FROM object o, object_version v, object_type ot where v.id = o.active_versionid and ot.id = o.object_typeid and o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) and o.is_published = 't' and ot.object_type_typeid <> 1 and exists ( select ova.object_versionid from attribute_value av, object_version_attribute ova where ova.attribute_valueid=av.id and object_versionid = v.id and (upper(av.text_val) like '%KIWI FRUIT%') ) Yet, if I modify the "exists" to an "in" all works well , as follows SELECT count(*) FROM object o, object_version v, object_type ot where v.id = o.active_versionid and ot.id = o.object_typeid and o.is_active ='t' and (o.is_archived = 'f' or o.is_archived is null) and o.is_published = 't' and ot.object_type_typeid <> 1 and v.id in ( select ova.object_versionid from attribute_value av, object_version_attribute ova where ova.attribute_valueid=av.id and (upper(av.text_val) like '%KIWI FRUIT%') ) So my question is why would a space character cause postgres to hang when using the exists clause I have tested this on several different servers and mostly get the same result (v8.08 and v8.1.11) , when I check the execution plan for either query (space or no space) they are identical. An upgrade to 8.3 fixes this, but I am still curious as to what could cause such bizarre behavior. Thanks Hans
[PERFORM] Problems with ordering (can't force query planner to use an index)
Hey, I have a table that links content together and it currently holds about 17 mio records. Typical query is a join with a content table and link table: noovo-new=# explain analyze SELECT "core_accessor"."id", "core_accessor"."content_type_id", "core_accessor"."object_id", "core_accessor"."ordering", "core_accessor"."label", "core_accessor"."date_posted", "core_accessor"."publish_state", "core_accessor"."nooximity_old", "core_accessor"."rising", "core_accessor"."nooximity", "core_accessor"."nooximity_old_date_posted", "core_accessor"."nooximity_date_posted", "core_accessor"."user_id", "core_accessor"."slot_id", "core_accessor"."slot_type_id", "core_accessor"."role", "core_base"."object_id", "core_base"."content_type_id", "core_base"."abstract", "core_base"."abstract_title", "core_base"."image", "core_base"."date_posted", "core_base"."date_modified", "core_base"."date_expires", "core_base"."publish_state", "core_base"."location", "core_base"."location_x", "core_base"."location_y", "core_base"."raw", "core_base"."author_id", "core_base"."excerpt", "core_base"."state_id", "core_base"."country_id", "core_base"."language", "core_base"."_identifier", "core_base"."slot_url", "core_base"."source_id", "core_base"."source_content_type_id", "core_base"."source_type", "core_base"."source_value", "core_base"."source_title", "core_base"."direct_to_source", "core_base"."comment_count", "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS core_base ON core_base.content_type_id = core_accessor.content_type_id AND core_base.object_id = core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119 AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E'' AND "core_accessor"."publish_state" >= 60 AND "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0)) order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5 ; QUERY PLAN - Limit (cost=31930.65..31930.66 rows=5 width=860) (actual time=711.924..711.927 rows=5 loops=1) -> Sort (cost=31930.65..31937.80 rows=2861 width=860) (actual time=711.923..711.923 rows=5 loops=1) Sort Key: core_accessor.date_posted, core_accessor.nooximity Sort Method: top-N heapsort Memory: 31kB -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) (actual time=0.089..543.497 rows=68505 loops=1) -> Index Scan using core_accessor_fresh_idx on core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921 rows=69312 loops=1) Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) -> Index Scan using core_base_pkey on core_base (cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1 loops=69312) Index Cond: ((core_base.object_id = core_accessor.object_id) AND (core_base.content_type_id = core_accessor.content_type_id)) Total runtime: 712.031 ms (10 rows) noovo-new=# select * from pg_stat_user_tables where relname='core_accessor'; relid | schemaname |relname| seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze ---++---+--+--+--+---+---+---+---+---+++---+-+---+-- 51159 | public | core_accessor | 58 |749773516 | 13785608 | 149165183 | 9566 | 548 | 347 | 206 | 17144303 |251 | 2009-03-03 07:02:19.733778-06 | | 2009-03-03 06:17:47.784268-06 | (1 row) noovo-new=# \d+ core_accessor; Table "public.core_accessor" Column | Type | Modifiers | Description ---+--++- id| bigint | not null default nextval('core_accessor_id_seq'::regclass) | flavor| character varying(32)| | content_type_id | integer | not null | object_id | integer | not null | publish_state | smallint | not null | date_posted | timestamp with time zone | not null
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca wrote: > Hey, > > I have a table that links content together and it currently holds > about 17 mio records. Typical query is a join with a content table and > link table: > > noovo-new=# explain analyze SELECT "core_accessor"."id", > "core_accessor"."content_type_id", > "core_accessor"."object_id", "core_accessor"."ordering", > "core_accessor"."label", "core_accessor"."date_posted", > "core_accessor"."publish_state", "core_accessor"."nooximity_old", > "core_accessor"."rising", "core_accessor"."nooximity", > "core_accessor"."nooximity_old_date_posted", > "core_accessor"."nooximity_date_posted", "core_accessor"."user_id", > "core_accessor"."slot_id", "core_accessor"."slot_type_id", > "core_accessor"."role", "core_base"."object_id", > "core_base"."content_type_id", "core_base"."abstract", > "core_base"."abstract_title", "core_base"."image", > "core_base"."date_posted", "core_base"."date_modified", > "core_base"."date_expires", "core_base"."publish_state", > "core_base"."location", "core_base"."location_x", > "core_base"."location_y", "core_base"."raw", "core_base"."author_id", > "core_base"."excerpt", "core_base"."state_id", > "core_base"."country_id", "core_base"."language", > "core_base"."_identifier", > "core_base"."slot_url", "core_base"."source_id", > "core_base"."source_content_type_id", "core_base"."source_type", > "core_base"."source_value", "core_base"."source_title", > "core_base"."direct_to_source", "core_base"."comment_count", > "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS > core_base ON core_base.content_type_id = > core_accessor.content_type_id AND core_base.object_id = > core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119 > AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E'' > AND "core_accessor"."publish_state" >= 60 AND > "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0)) > order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5 > ; > > QUERY PLAN > - > Limit (cost=31930.65..31930.66 rows=5 width=860) (actual > time=711.924..711.927 rows=5 loops=1) > -> Sort (cost=31930.65..31937.80 rows=2861 width=860) (actual > time=711.923..711.923 rows=5 loops=1) > Sort Key: core_accessor.date_posted, core_accessor.nooximity > Sort Method: top-N heapsort Memory: 31kB > -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) > (actual time=0.089..543.497 rows=68505 loops=1) > -> Index Scan using core_accessor_fresh_idx on > core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual > time=0.068..54.921 rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = > 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND > (publish_state >= 60)) > -> Index Scan using core_base_pkey on core_base > (cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1 > loops=69312) > Index Cond: ((core_base.object_id = > core_accessor.object_id) AND (core_base.content_type_id = > core_accessor.content_type_id)) > Total runtime: 712.031 ms > (10 rows) > > noovo-new=# select * from pg_stat_user_tables where relname='core_accessor'; > relid | schemaname | relname | seq_scan | seq_tup_read | > idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | > n_tup_hot_upd | n_live_tup | n_dead_tup | last_vacuum > | last_autovacuum | last_analyze | last_autoanalyze > ---++---+--+--+--+---+---+---+---+---+++---+-+---+-- > 51159 | public | core_accessor | 58 | 749773516 | > 13785608 | 149165183 | 9566 | 548 | 347 | > 206 | 17144303 | 251 | 2009-03-03 07:02:19.733778-06 | > | 2009-03-03 06:17:47.784268-06 | > (1 row) > > noovo-new=# \d+ core_accessor; > Table "public.core_accessor" > Column | Type | > Modifiers | Description > ---+--++- > id | bigint | not null > default nextval('core_accessor_id_seq'::regclass) | > flavor | character varying(32) | > | > content_type_id | integer | not null > | > object_id | integer | not null >
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
But it's already attached in the first mail or am I missing something? If you don't see it, check this: http://pastebin.com/d71b996d0 Sebastjan On Tue, Mar 3, 2009 at 6:12 PM, Robert Haas wrote: > On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca wrote: >> Hey, >> >> I have a table that links content together and it currently holds >> about 17 mio records. Typical query is a join with a content table and >> link table: >> >> noovo-new=# explain analyze SELECT "core_accessor"."id", >> "core_accessor"."content_type_id", >> "core_accessor"."object_id", "core_accessor"."ordering", >> "core_accessor"."label", "core_accessor"."date_posted", >> "core_accessor"."publish_state", "core_accessor"."nooximity_old", >> "core_accessor"."rising", "core_accessor"."nooximity", >> "core_accessor"."nooximity_old_date_posted", >> "core_accessor"."nooximity_date_posted", "core_accessor"."user_id", >> "core_accessor"."slot_id", "core_accessor"."slot_type_id", >> "core_accessor"."role", "core_base"."object_id", >> "core_base"."content_type_id", "core_base"."abstract", >> "core_base"."abstract_title", "core_base"."image", >> "core_base"."date_posted", "core_base"."date_modified", >> "core_base"."date_expires", "core_base"."publish_state", >> "core_base"."location", "core_base"."location_x", >> "core_base"."location_y", "core_base"."raw", "core_base"."author_id", >> "core_base"."excerpt", "core_base"."state_id", >> "core_base"."country_id", "core_base"."language", >> "core_base"."_identifier", >> "core_base"."slot_url", "core_base"."source_id", >> "core_base"."source_content_type_id", "core_base"."source_type", >> "core_base"."source_value", "core_base"."source_title", >> "core_base"."direct_to_source", "core_base"."comment_count", >> "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS >> core_base ON core_base.content_type_id = >> core_accessor.content_type_id AND core_base.object_id = >> core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119 >> AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E'' >> AND "core_accessor"."publish_state" >= 60 AND >> "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0)) >> order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5 >> ; >> >> QUERY PLAN >> - >> Limit (cost=31930.65..31930.66 rows=5 width=860) (actual >> time=711.924..711.927 rows=5 loops=1) >> -> Sort (cost=31930.65..31937.80 rows=2861 width=860) (actual >> time=711.923..711.923 rows=5 loops=1) >> Sort Key: core_accessor.date_posted, core_accessor.nooximity >> Sort Method: top-N heapsort Memory: 31kB >> -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) >> (actual time=0.089..543.497 rows=68505 loops=1) >> -> Index Scan using core_accessor_fresh_idx on >> core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual >> time=0.068..54.921 rows=69312 loops=1) >> Index Cond: ((slot_id = 472) AND (slot_type_id = >> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND >> (publish_state >= 60)) >> -> Index Scan using core_base_pkey on core_base >> (cost=0.00..8.88 rows=1 width=768) (actual time=0.004..0.005 rows=1 >> loops=69312) >> Index Cond: ((core_base.object_id = >> core_accessor.object_id) AND (core_base.content_type_id = >> core_accessor.content_type_id)) >> Total runtime: 712.031 ms >> (10 rows) >> >> noovo-new=# select * from pg_stat_user_tables where relname='core_accessor'; >> relid | schemaname | relname | seq_scan | seq_tup_read | >> idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | >> n_tup_hot_upd | n_live_tup | n_dead_tup | last_vacuum >> | last_autovacuum | last_analyze | last_autoanalyze >> ---++---+--+--+--+---+---+---+---+---+++---+-+---+-- >> 51159 | public | core_accessor | 58 | 749773516 | >> 13785608 | 149165183 | 9566 | 548 | 347 | >> 206 | 17144303 | 251 | 2009-03-03 07:02:19.733778-06 | >> | 2009-03-03 06:17:47.784268-06 | >> (1 row) >> >> noovo-new=# \d+ core_accessor; >> Table "public.core_accessor" >> Column | Type | >> Modifiers | Description >> ---+--++- >> id | bigint | not null >> default nextval('core_accessor_id_seq'::regclass) | >> flavor
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Mon, Mar 2, 2009 at 10:23 PM, Tom Lane wrote: > Are you sure you are comparing apples to apples here? Same configure > options for the builds, same parameter values in postgresql.conf, both > databases ANALYZEd, etc? And are they running on the same hardware? > Thank you for looking at this Tom. Yes, we have made sure we are comparing apples to apples here. The postgresql.confs are identical, as are the configure flags: --disable-debug --enable-shared --enable-thread-safety --with-perl --with-pam --without-docdir --without-tcl --without-python --without-krb5 --without-ldap --without-bonjour --enable-integer-datetimes --prefix=/opt/postgresql However, the db was not analyzed. I'll attached the new explain analyze of the queries with the db analyzed, but 8.2 still beats 8.3. The tests are both being run on the same machine, a Quad-core AMD Opteron Processor 2212 (each with 1024 KB cache) and 4GB of RAM. I find it telling that the query plan differs so much between postgres 8.2. and 8.3. For example, why does the 8.3. planner choose to perform so many seq scans? I know seq scans are faster than index scans for small tables, but these tables have 60K+ rows... surely an index scan would have been a better choice here? If you look at the 8.2. query plan, it is very clean in comparison, index scans all the way through. I can't help but think the 8.3 planner is simply failing to make the right choices in our case. Another question would be, why are there so many hash joins in the 8.3 plan now? All our indexes are btrees... Any light that can be shed on what going on with the 8.3. planner would be much appreciated. Thanks in advance.
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca wrote: > -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) > (actual time=0.089..543.497 rows=68505 loops=1) > -> Index Scan using core_accessor_fresh_idx on > core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual > time=0.068..54.921 rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = > 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND > (publish_state >= 60)) That index scan on core_accessor_fresh_idx has a pretty big disparity between what the planer expects to get (2970 rows) and what it actually gets (69312 rows). You should try increasing the statistics target if you haven't, then re-analyze and try the query again to see if the planner picks something better. The default of 10 is pretty small- try 100, or higher. -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Tue, 3 Mar 2009, Aaron Guyon wrote: On Mon, Mar 2, 2009 at 10:23 PM, Tom Lane wrote: Are you sure you are comparing apples to apples here? Same configure options for the builds, same parameter values in postgresql.conf, both databases ANALYZEd, etc? And are they running on the same hardware? Thank you for looking at this Tom. Yes, we have made sure we are comparing apples to apples here. The postgresql.confs are identical, as are the configure flags: --disable-debug --enable-shared --enable-thread-safety --with-perl --with-pam --without-docdir --without-tcl --without-python --without-krb5 --without-ldap --without-bonjour --enable-integer-datetimes --prefix=/opt/postgresql However, the db was not analyzed. I'll attached the new explain analyze of the queries with the db analyzed, but 8.2 still beats 8.3. The tests are both being run on the same machine, a Quad-core AMD Opteron Processor 2212 (each with 1024 KB cache) and 4GB of RAM. I find it telling that the query plan differs so much between postgres 8.2. and 8.3. For example, why does the 8.3. planner choose to perform so many seq scans? I know seq scans are faster than index scans for small tables, but these tables have 60K+ rows... surely an index scan would have been a better choice here? If you look at the 8.2. query plan, it is very clean in comparison, index scans all the way through. I can't help but think the 8.3 planner is simply failing to make the right choices in our case. Another question would be, why are there so many hash joins in the 8.3 plan now? All our indexes are btrees... Any light that can be shed on what going on with the 8.3. planner would be much appreciated. Thanks in advance. if you haven't done a vaccum analyse on either installation then postgres' idea of what sort of data is in the database is unpredictable, and as a result it's not surprising that the two systems guess differently about what sort of plan is going to be most efficiant. try doing vaccum analyse on both databases and see what the results are. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca wrote: > But it's already attached in the first mail or am I missing something? > > If you don't see it, check this: http://pastebin.com/d71b996d0 Woops, sorry, I thought you had sent plain EXPLAIN. I see it now. The lowest level at which I see a problem is here: -> Index Scan using core_accessor_fresh_idx on core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921 rows=69312 loops=1) Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) For some reason it expect 2970 rows but gets 69312. A good place to start is to change your default_statistics_target value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
Set statistics to 1000, reanalyzed and got exactly same results: noovo-new=# explain analyze SELECT "core_accessor"."id", "core_accessor"."content_type_id", "core_accessor"."object_id", "core_accessor"."ordering", "core_accessor"."label", "core_accessor"."date_posted", "core_accessor"."publish_state", "core_accessor"."nooximity_old", "core_accessor"."rising", "core_accessor"."nooximity", "core_accessor"."nooximity_old_date_posted", "core_accessor"."nooximity_date_posted", "core_accessor"."user_id", "core_accessor"."slot_id", "core_accessor"."slot_type_id", "core_accessor"."role", "core_base"."object_id", "core_base"."content_type_id", "core_base"."abstract", "core_base"."abstract_title", "core_base"."image", "core_base"."date_posted", "core_base"."date_modified", "core_base"."date_expires", "core_base"."publish_state", "core_base"."location", "core_base"."location_x", "core_base"."location_y", "core_base"."raw", "core_base"."author_id", "core_base"."excerpt", "core_base"."state_id", "core_base"."country_id", "core_base"."language", "core_base"."_identifier", "core_base"."slot_url", "core_base"."source_id", "core_base"."source_content_type_id", "core_base"."source_type", "core_base"."source_value", "core_base"."source_title", "core_base"."direct_to_source", "core_base"."comment_count", "core_base"."public" FROM "core_accessor" INNER JOIN core_base AS core_base ON core_base.content_type_id = core_accessor.content_type_id AND core_base.object_id = core_accessor.object_id WHERE (("core_accessor"."slot_type_id" = 119 AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E'' AND "core_accessor"."publish_state" >= 60 AND "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0)) order by core_accessor.date_posted, core_accessor.nooximity LIMIT 5 ; QUERY PLAN - Limit (cost=31716.13..31716.14 rows=5 width=860) (actual time=711.340..711.343 rows=5 loops=1) -> Sort (cost=31716.13..31722.19 rows=2424 width=860) (actual time=711.339..711.339 rows=5 loops=1) Sort Key: core_accessor.date_posted, core_accessor.nooximity Sort Method: top-N heapsort Memory: 31kB -> Nested Loop (cost=0.00..31675.87 rows=2424 width=860) (actual time=0.076..544.039 rows=68505 loops=1) -> Index Scan using core_accessor_fresh_idx on core_accessor (cost=0.00..9234.77 rows=2511 width=92) (actual time=0.058..55.225 rows=69312 loops=1) Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) -> Index Scan using core_base_pkey on core_base (cost=0.00..8.92 rows=1 width=768) (actual time=0.005..0.005 rows=1 loops=69312) Index Cond: ((core_base.object_id = core_accessor.object_id) AND (core_base.content_type_id = core_accessor.content_type_id)) Total runtime: 711.443 ms (10 rows) This is how I did it: noovo-new=# alter table core_accessor alter column slot_id set statistics 1000; ALTER TABLE noovo-new=# alter table core_accessor alter column slot_type_id set statistics 1000; ALTER TABLE noovo-new=# alter table core_accessor alter column label set statistics 1000; ALTER TABLE noovo-new=# alter table core_accessor alter column user_id set statistics 1000; ALTER TABLE noovo-new=# alter table core_accessor alter column role set statistics 1000; ALTER TABLE noovo-new=# alter table core_accessor alter column publish_state set statistics 1000; ALTER TABLE noovo-new=# analyze core_accessor; ANALYZE Sebastjan On Tue, Mar 3, 2009 at 6:34 PM, David Wilson wrote: > On Tue, Mar 3, 2009 at 12:05 PM, Sebastjan Trepca wrote: > >> -> Nested Loop (cost=0.00..31883.13 rows=2861 width=860) >> (actual time=0.089..543.497 rows=68505 loops=1) >> -> Index Scan using core_accessor_fresh_idx on >> core_accessor (cost=0.00..5460.07 rows=2970 width=92) (actual >> time=0.068..54.921 rows=69312 loops=1) >> Index Cond: ((slot_id = 472) AND (slot_type_id = >> 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND >> (publish_state >= 60)) > > That index scan on core_accessor_fresh_idx has a pretty big disparity > between what the planer expects to get (2970 rows) and what it > actually gets (69312 rows). You should try increasing the statistics > target if you haven't, then re-analyze and try the query again to see > if the planner picks something better. The default of 10 is pretty > small- try 100, or higher. > > > > -- > - David T. Wilson > david.t.wil...@gmail.com > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
On Tue, Mar 3, 2009 at 12:38 PM, wrote: > if you haven't done a vaccum analyse on either installation then postgres' > idea of what sort of data is in the database is unpredictable, and as a > result it's not surprising that the two systems guess differently about what > sort of plan is going to be most efficiant. > > try doing vaccum analyse on both databases and see what the results are. > > David Lang > These are the results with vacuum analyze: 8.2.12: 624.366 ms 8.3.3: 1273.601 ms QUERY PLAN - Unique (cost=9832.91..9832.92 rows=2 width=52) (actual time=623.808..623.808 rows=0 loops=1) -> Sort (cost=9832.91..9832.92 rows=2 width=52) (actual time=623.799..623.799 rows=0 loops=1) Sort Key: t8.id -> Nested Loop (cost=0.00..9832.90 rows=2 width=52) (actual time=623.467..623.467 rows=0 loops=1) Join Filter: ((t2.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t3.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t6.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t7.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t8.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t9.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t10.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text)) OR (t11.not_modified_since >= to_timestamp('2009-02-24T11:19:57'::text, '/MM/DD HH:MI:SS'::text))) -> Nested Loop (cost=0.00..7381.49 rows=598 width=108) (actual time=75.634..556.642 rows=1104 loops=1) -> Nested Loop (cost=0.00..5087.76 rows=602 width=59) (actual time=71.744..523.690 rows=1104 loops=1) -> Nested Loop (cost=0.00..1171.66 rows=165 width=51) (actual time=66.427..499.798 rows=332 loops=1) -> Nested Loop (cost=0.00..684.77 rows=117 width=54) (actual time=38.266..440.024 rows=336 loops=1) -> Nested Loop (cost=0.00..84.08 rows=182 width=35) (actual time=16.785..402.046 rows=376 loops=1) -> Nested Loop (cost=0.00..52.33 rows=4 width=27) (actual time=11.355..11.547 rows=4 loops=1) -> Index Scan using idx_day_part_du on day_part t10 (cost=0.00..8.28 rows=7 width=19) (actual time=0.713..0.748 rows=7 loops=1) Index Cond: (display_unit_id = 250893::numeric) Filter: (active <> 0::numeric) -> Index Scan using idx_skin_day_part_id on skin t2 (cost=0.00..6.28 rows=1 width=30) (actual time=1.526..1.529 rows=1 loops=7) Index Cond: (t2.day_part_id = t10.id) Filter: (active <> 0::numeric) -> Index Scan using idx_skin_slot_skin_id on skin
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
Still the same :/ I raised the default_statistics_target to 600 (it was already 100). I then restarted pg, ran analyze through all tables and yet there is not effect. This is the output for core_accessor: INFO: analyzing "public.core_accessor" INFO: "core_accessor": scanned 291230 of 291230 pages, containing 17144315 live rows and 0 dead rows; 30 rows in sample, 17144315 estimated total rows It thinks there are even less rows in the set: - Limit (cost=30816.49..30816.50 rows=5 width=855) (actual time=683.907..683.910 rows=5 loops=1) -> Sort (cost=30816.49..30822.29 rows=2321 width=855) (actual time=683.906..683.907 rows=5 loops=1) Sort Key: core_accessor.date_posted, core_accessor.nooximity Sort Method: top-N heapsort Memory: 31kB -> Nested Loop (cost=0.00..30777.94 rows=2321 width=855) (actual time=0.072..517.970 rows=68505 loops=1) -> Index Scan using core_accessor_fresh_idx on core_accessor (cost=0.00..8955.44 rows=2440 width=92) (actual time=0.056..53.107 rows=69312 loops=1) Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) -> Index Scan using core_base_pkey on core_base (cost=0.00..8.93 rows=1 width=763) (actual time=0.004..0.005 rows=1 loops=69312) Index Cond: ((core_base.object_id = core_accessor.object_id) AND (core_base.content_type_id = core_accessor.content_type_id)) Total runtime: 684.015 ms (10 rows) Sebastjan On Tue, Mar 3, 2009 at 6:40 PM, Robert Haas wrote: > On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca wrote: >> But it's already attached in the first mail or am I missing something? >> >> If you don't see it, check this: http://pastebin.com/d71b996d0 > > Woops, sorry, I thought you had sent plain EXPLAIN. I see it now. > > The lowest level at which I see a problem is here: > > -> Index Scan using core_accessor_fresh_idx on core_accessor > (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921 > rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = > ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) > > For some reason it expect 2970 rows but gets 69312. > > A good place to start is to change your default_statistics_target > value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE. > > ...Robert > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
Maybe this is useful, I removed the JOIN and it uses other index(core_accessor_date_idx indexes (date_posted, nooximity)), but its still hardly any better: noovo-new=# explain analyze SELECT * FROM "core_accessor" WHERE (("core_accessor"."slot_type_id" = 119 noovo-new(# AND "core_accessor"."slot_id" = 472 AND "core_accessor"."label" = E'' AND "core_accessor"."publish_state" >= 60 AND noovo-new(# "core_accessor"."role" IN (0) AND "core_accessor"."user_id" = 0)) ORDER BY "core_accessor"."date_posted" DESC, "core_accessor"."nooximity" DESC LIMIT 5 noovo-new-# ; QUERY PLAN -- Limit (cost=0.00..3709.56 rows=5 width=178) (actual time=4593.867..4597.587 rows=5 loops=1) -> Index Scan Backward using core_accessor_date_idx on core_accessor (cost=0.00..1810265.67 rows=2440 width=178) (actual time=4593.866..4597.583 rows=5 loops=1) Filter: ((publish_state >= 60) AND (slot_type_id = 119) AND (slot_id = 472) AND (label = ''::text) AND (role = 0) AND (user_id = 0)) Total runtime: 4597.632 ms (4 rows) Sebastjan On Tue, Mar 3, 2009 at 8:05 PM, Sebastjan Trepca wrote: > Still the same :/ > > I raised the default_statistics_target to 600 (it was already 100). I > then restarted pg, ran analyze through all tables and yet there is not > effect. > This is the output for core_accessor: > INFO: analyzing "public.core_accessor" > INFO: "core_accessor": scanned 291230 of 291230 pages, containing > 17144315 live rows and 0 dead rows; 30 rows in sample, 17144315 > estimated total rows > > It thinks there are even less rows in the set: > > - > Limit (cost=30816.49..30816.50 rows=5 width=855) (actual > time=683.907..683.910 rows=5 loops=1) > -> Sort (cost=30816.49..30822.29 rows=2321 width=855) (actual > time=683.906..683.907 rows=5 loops=1) > Sort Key: core_accessor.date_posted, core_accessor.nooximity > Sort Method: top-N heapsort Memory: 31kB > -> Nested Loop (cost=0.00..30777.94 rows=2321 width=855) > (actual time=0.072..517.970 rows=68505 loops=1) > -> Index Scan using core_accessor_fresh_idx on > core_accessor (cost=0.00..8955.44 rows=2440 width=92) (actual > time=0.056..53.107 rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = > 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND > (publish_state >= 60)) > -> Index Scan using core_base_pkey on core_base > (cost=0.00..8.93 rows=1 width=763) (actual time=0.004..0.005 rows=1 > loops=69312) > Index Cond: ((core_base.object_id = > core_accessor.object_id) AND (core_base.content_type_id = > core_accessor.content_type_id)) > Total runtime: 684.015 ms > (10 rows) > > > > > > Sebastjan > > > > On Tue, Mar 3, 2009 at 6:40 PM, Robert Haas wrote: >> On Tue, Mar 3, 2009 at 12:20 PM, Sebastjan Trepca wrote: >>> But it's already attached in the first mail or am I missing something? >>> >>> If you don't see it, check this: http://pastebin.com/d71b996d0 >> >> Woops, sorry, I thought you had sent plain EXPLAIN. I see it now. >> >> The lowest level at which I see a problem is here: >> >> -> Index Scan using core_accessor_fresh_idx on core_accessor >> (cost=0.00..5460.07 rows=2970 width=92) (actual time=0.068..54.921 >> rows=69312 loops=1) >> Index Cond: ((slot_id = 472) AND (slot_type_id = 119) AND (label = >> ''::text) AND (user_id = 0) AND (role = 0) AND (publish_state >= 60)) >> >> For some reason it expect 2970 rows but gets 69312. >> >> A good place to start is to change your default_statistics_target >> value to 100 in postgresql.conf, restart postgresql, and re-ANALYZE. >> >> ...Robert >> > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
On Tue, Mar 3, 2009 at 2:16 PM, Sebastjan Trepca wrote: > Maybe this is useful, I removed the JOIN and it uses other > index(core_accessor_date_idx indexes (date_posted, nooximity)), but > its still hardly any better: > > noovo-new=# explain analyze SELECT * FROM "core_accessor" WHERE > (("core_accessor"."slot_type_id" = 119 > noovo-new(# AND "core_accessor"."slot_id" = 472 AND > "core_accessor"."label" = E'' AND "core_accessor"."publish_state" >= > 60 AND > noovo-new(# "core_accessor"."role" IN (0) AND > "core_accessor"."user_id" = 0)) ORDER BY "core_accessor"."date_posted" > DESC, "core_accessor"."nooximity" DESC LIMIT 5 > noovo-new-# ; > > QUERY PLAN > -- > Limit (cost=0.00..3709.56 rows=5 width=178) (actual > time=4593.867..4597.587 rows=5 loops=1) > -> Index Scan Backward using core_accessor_date_idx on > core_accessor (cost=0.00..1810265.67 rows=2440 width=178) (actual > time=4593.866..4597.583 rows=5 loops=1) > Filter: ((publish_state >= 60) AND (slot_type_id = 119) AND > (slot_id = 472) AND (label = ''::text) AND (role = 0) AND (user_id = > 0)) > Total runtime: 4597.632 ms > (4 rows) > > > Sebastjan Well, in that case, you are being bitten by the fact that our multi-column selectivity estimates are not very good. The planner has good information on how each column behaves in isolation, but not how they act together. I've found this to be a very difficult problem to fix. Which of the parameters in this query vary and which ones are typically always the same? Sometimes you can improve things by creating an appropriate partial index. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problems with ordering (can't force query planner to use an index)
Sebastjan Trepca writes: > It thinks there are even less rows in the set: >-> Index Scan using core_accessor_fresh_idx on > core_accessor (cost=0.00..8955.44 rows=2440 width=92) (actual > time=0.056..53.107 rows=69312 loops=1) > Index Cond: ((slot_id = 472) AND (slot_type_id = > 119) AND (label = ''::text) AND (user_id = 0) AND (role = 0) AND > (publish_state >= 60)) Maybe you should get rid of this six-column index, if you'd rather the query didn't use it. It seems a tad overspecialized anyway. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 8.3, four times slower queries?
Aaron Guyon writes: > I find it telling that the query plan differs so much between postgres 8.2. Well, you haven't shown us either the query or the table definitions, so we're just guessing in the dark. However, the occurrences of "::numeric" in the query plan make me wonder whether all of your join keys are numeric type. If so, the reason 8.2 didn't use any hash joins is that it couldn't --- it didn't have a hash method for numerics. 8.3 does and therefore has more flexibility of plan choice. Comparisons on numerics aren't terribly fast though (in either release). I wonder whether you could change the key columns to int or bigint. I also find it a tad fishy that both releases are choosing *exactly* the same join order when there is hardly anything else that is identical about the plans --- given the cross-release variance in rowcount estimates etc I'd have expected at least one difference. Are you doing something to force the join order, like running with a small join_collapse_limit setting? If so maybe you shouldn't. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] work_mem in high transaction rate database
Hello all In a dedicated server with 16 cores and 16GB of RAM running PostgreSQL 8.2.5 we have a database with basically two kinds of transactions: - short transactions with a couple of updates and inserts that runs all the day; - batch data loads with hundreds of inserts that runs several times a day; - one delete for thousands of lines after each batch; - selects are made when users need reports, low concurrency here. Today the max_connections is ~2500 where the application is a cluster of JBoss servers with a pool a bit smaller then this total. work_mem = 1GB maintenance_work_mem = 1GB shared_buffers = 4GB autovacuum takes a lot of time running in the largest tables (3 large tables in 50) causing some connections to have to wait for it to finish to start transactioning again. I see a few processes (connections) using 10 ~ 20% of total system memory and the others using no more then 1%. What I want to ask is: is it better to keep the work_mem as high as it is today or is it a safe bet triyng to reduce this number, for example, to 1 or 2MB so I can keep the distribution of memory more balanced among all connections? Thanks! Flavio Henrique A. Gurgel
Re: [PERFORM] work_mem in high transaction rate database
On Tue, Mar 3, 2009 at 5:28 PM, Flavio Henrique Araque Gurgel wrote: > Hello all > > In a dedicated server with 16 cores and 16GB of RAM running PostgreSQL 8.2.5 > we have a database with basically two kinds of transactions: > - short transactions with a couple of updates and inserts that runs all the > day; > - batch data loads with hundreds of inserts that runs several times a day; > - one delete for thousands of lines after each batch; > - selects are made when users need reports, low concurrency here. > > Today the max_connections is ~2500 where the application is a cluster of > JBoss servers with a pool a bit smaller then this total. > work_mem = 1GB > maintenance_work_mem = 1GB > shared_buffers = 4GB Oh my lord, that is a foot gun waiting to go off. Assuming 2k connections, and somehow a fair number of them went active with big sorts, you'd be able to exhaust all physical memory with about 8 to 16 connections. Lower work_mem now. To something like 1 to 4 Meg. Do not pass go. If some oddball query really needs a lot of work_mem, and benchmarks show something larger work_mem helps, consider raising the work_mem setting for that one query to something under 1G (way under 1G) That makes it noticeably faster. Don't allocate more than a test shows you helps. > autovacuum takes a lot of time running in the largest tables (3 large tables > in 50) causing some connections to have to wait for it to finish to start > transactioning again. Vacuum does not block transactions. unless you're dropping tables or something. > I see a few processes (connections) using 10 ~ 20% of total system memory > and the others using no more then 1%. This is commonly misread. It has to do with the vagaries of shared memory allocation and accounting. The numbers quite likely don't mean what you think they mean. Post the first 20 or so lines from top to show us. > What I want to ask is: is it better to keep the work_mem as high as it is > today or is it a safe bet triyng to reduce this number, for example, to 1 or > 2MB so I can keep the distribution of memory more balanced among all > connections? You're work_mem is dangerously high. Your current reading of top may not actually support lowering it directly. Since you've got 4G shared_buffers allocated, any process that's touched all or most of shared_buffer memory will show as using 4G of ram. That's why you should post output of top, or google on linux virtual memory and top and what the numbers mean. Let's say that 1% of your queries can benefit from > 100Meg work_mem, and 5% with 60M, and 10% with 40M, and 20% with 20M, and 30% with 16M, and 50% 8M and 4M is enough for all the else to do well. If, somehow, 100 queries fired off that could use > 100Meg, they might, with your current settings use all your memory and start using swap til swap ran out and they started getting out of memory errors and failing. This would affect all the other queries on the machine as well. OTOH, if you had work_mem limited to 16M, and 100 of those same queries fired off, they'd individually run a little slower, but they wouldn't be able to run the machine out of memory. If your work_mem and max_connections multiplied is > than some fraction of memory you're doing it wrong, and setting your machine up for mysterious, heavy load failures, the worst kind. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] work_mem in high transaction rate database
Tue, 3 Mar 2009 18:37:42 -0700 -n Scott Marlowe írta: > Oh my lord, that is a foot gun waiting to go off. Assuming 2k > connections, and somehow a fair number of them went active with big I absolutely agree with Scott. Plus set effective_cache_size accordingly, this would help the planner. You can read a lot about setting this in the mailing list archives. -- Üdvözlettel, Gábriel Ákos -=E-Mail :akos.gabr...@i-logic.hu|Web: http://www.i-logic.hu=- -=Tel/fax:+3612367353|Mobil:+36209278894 =- -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance