Hi, Today on one of databases under my management I found very strange plan for simple query. Postgresql 9.0.7 on Linux, random_page_cost=4 seq_page_cost=1
The query and plan: db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id FROM blog_post as obj JOIN person2obj ON p2o_obj_obj_id = obj_id JOIN person2obj_counters ON p2oc_id = p2o_id WHERE obj_status_did = 1 AND obj_commented IS NOT NULL AND obj_commented > now() - '7days'::interval AND obj_commented > p2o_notified AND p2o_notify = 't'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1) Merge Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id) Buffers: shared hit=1140491 -> Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1) Buffers: shared hit=1108452 -> Sort (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1) Sort Key: person2obj.p2o_id Sort Method: quicksort Memory: 807kB Buffers: shared hit=32039 -> Nested Loop (cost=0.00..6537.82 rows=221 width=24) (actual time=0.097..80.129 rows=7870 loops=1) Buffers: shared hit=32039 -> Index Scan using i_blog_post_commented_active on blog_post obj (cost=0.00..225.73 rows=1726 width=16) (actual time=0.028..17.957 rows=6010 loops=1) Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented > (now() - '7 days'::interval))) Buffers: shared hit=6207 -> Index Scan using i_person2obj_obj_notified_subscribed on person2obj (cost=0.00..3.64 rows=1 width=24) (actual time=0.006..0.009 rows=1 loops=6010) Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented > person2obj.p2o_notified)) Buffers: shared hit=25832 Total runtime: 1349.767 ms I don't understand why database choose merge join with 1.2M entries table. person2obj_counters have an index on p2oc_id (it's a primary key field). Switch to fast inner loop plan could be managed with set random_page_cost=10 db=# set random_page_cost to 10; SET db=# EXPLAIN (ANALYZE, COSTS, BUFFERS) select obj_id, obj_commented,p2o_id FROM blog_post as obj JOIN person2obj ON p2o_obj_obj_id = obj_id JOIN person2obj_counters ON p2oc_id = p2o_id WHERE obj_status_did = 1 AND obj_commented IS NOT NULL AND obj_commented > now() - '7days'::interval AND obj_commented > p2o_notified AND p2o_notify = 't'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..14810.38 rows=3 width=24) (actual time=16.910..115.110 rows=1758 loops=1) Buffers: shared hit=57403 -> Nested Loop (cost=0.00..14616.37 rows=221 width=24) (actual time=0.088..82.342 rows=7858 loops=1) Buffers: shared hit=32046 -> Index Scan using i_blog_post_commented_active on blog_post obj (cost=0.00..273.70 rows=1725 width=16) (actual time=0.029..16.260 rows=6009 loops=1) Index Cond: ((obj_commented IS NOT NULL) AND (obj_commented > (now() - '7 days'::interval))) Buffers: shared hit=6222 -> Index Scan using i_person2obj_obj_notified_subscribed on person2obj (cost=0.00..8.30 rows=1 width=24) (actual time=0.007..0.010 rows=1 loops=6009) Index Cond: ((person2obj.p2o_obj_obj_id = obj.obj_id) AND (obj.obj_commented > person2obj.p2o_notified)) Buffers: shared hit=25824 -> Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..0.87 rows=1 width=8) (actual time=0.004..0.004 rows=0 loops=7858) Index Cond: (person2obj_counters.p2oc_id = person2obj.p2o_id) Buffers: shared hit=25357 Total runtime: 115.465 ms (14 rows) Ok... so what I see... the database think it will need join 221 rows from previous level to the person2obj_counters table. And somehow Pg manage to calculate that the full index scan over 1.2M entries of the person2obj_counters table is faster that nested loop probes over 221 value. What look very suspicious is that merge full index scan+merge join part adds only 40 points to the total cost ( Merge Join (cost=6546.54..6584.69 rows=3 width=24) (actual time=86.262..1349.266 rows=1770 loops=1) ... -> Index Scan using pk_person2obj_counters on person2obj_counters (cost=0.00..47110.95 rows=1212591 width=8) (actual time=0.008..997.948 rows=1212765 loops=1) ... -> Sort (cost=6546.42..6546.98 rows=221 width=24) (actual time=85.877..88.373 rows=7870 loops=1) )... how that could be? -- Maxim Boguk Senior Postgresql DBA. Phone RU: +7 910 405 4718 Phone AU: +61 45 218 5678 Skype: maxim.boguk Jabber: maxim.bo...@gmail.com LinkedIn profile: http://nz.linkedin.com/in/maximboguk "If they can send one man to the moon... why can't they send them all?" МойКруг: http://mboguk.moikrug.ru/ "People problems are solved with people. If people cannot solve the problem, try technology. People will then wish they'd listened at the first stage."