Hello hackers, Observe the following test case (apologies if this is a well understood problem):
create temp table foo as select generate_series(1,1000000) id; create index on foo(id); create temp table bar as select id, id % 100000 = 0 as good from generate_series(1,1000000) id; create index on bar(good); analyze foo; analyze bar; explain analyze select * from foo where false or exists (select 1 from bar where good and foo.id = bar.id); -- A explain analyze select * from foo where exists (select 1 from bar where good and foo.id = bar.id); -- B These queries are trivially verified as identical but give very different plans. A gives QUERY PLAN ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Seq Scan on foo (cost=0.00..4459425.00 rows=500000 width=4) (actual time=13.299..130.271 rows=10 loops=1) Filter: (alternatives: SubPlan 1 or hashed SubPlan 2) Rows Removed by Filter: 999990 SubPlan 1 -> Index Scan using bar_good_idx on bar (cost=0.42..4.45 rows=1 width=0) (never executed) Index Cond: (good = true) Filter: (good AND (foo.id = id)) SubPlan 2 -> Index Scan using bar_good_idx on bar bar_1 (cost=0.42..4.44 rows=1 width=4) (actual time=0.024..0.055 rows=10 loops=1) Index Cond: (good = true) Filter: good Planning time: 0.103 ms Execution time: 130.312 ms B gives QUERY PLAN ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── Nested Loop (cost=4.87..12.91 rows=1 width=4) (actual time=0.075..0.161 rows=10 loops=1) -> HashAggregate (cost=4.45..4.46 rows=1 width=4) (actual time=0.058..0.060 rows=10 loops=1) Group Key: bar.id -> Index Scan using bar_good_idx on bar (cost=0.42..4.44 rows=1 width=4) (actual time=0.018..0.045 rows=10 loops=1) Index Cond: (good = true) Filter: good -> Index Only Scan using foo_id_idx on foo (cost=0.42..8.44 rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=10) Index Cond: (id = bar.id) Heap Fetches: 10 Planning time: 0.193 ms Execution time: 0.187 ms This is a general problem to OR expressions while AND expressions will generally pass the optimization through. The 'old school' optimization approach is to rewrite the OR expressions to UNION ALL but this can have unpleasant downstream effects on the query in real world scenarios. The question is: can the one time filter logic be expanded such the first query can be functionally be written into the second one? This type of query happens a lot when trying to mix multiple different filtering expressions (a 'filter mode' if you will) in a single query based on a user supplied switch. Food for thought. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers