In the past we pull-up the ANY-sublink with 2 steps, the first step is to pull up the sublink as a subquery, and the next step is to pull up the subquery if it is allowed. The benefits of this method are obvious, pulling up the subquery has more requirements, even if we can just finish the first step, we still get huge benefits. However the bad stuff happens if varlevelsup = 1 involves, things fail at step 1.
convert_ANY_sublink_to_join ... if (contain_vars_of_level((Node *) subselect, 1)) return NULL; In this patch we distinguish the above case and try to pull-up it within one step if it is helpful, It looks to me that what we need to do is just transform it to EXIST-SUBLINK. The only change is transforming the format of SUBLINK, so outer-join / pull-up as semi-join is unrelated, so the correctness should not be an issue. I can help with the following query very much. master: explain (costs off, analyze) select * from tenk1 t1 where hundred in (select hundred from tenk2 t2 where t2.odd = t1.odd and even in (select even from tenk1 t3 where t3.fivethous = t2.fivethous)) and even > 0; QUERY PLAN ------------------------------------------------------------------------------------ Seq Scan on tenk1 t1 (actual time=0.023..234.955 rows=10000 loops=1) Filter: ((even > 0) AND (SubPlan 2)) SubPlan 2 -> Seq Scan on tenk2 t2 (actual time=0.023..0.023 rows=1 loops=10000) Filter: ((odd = t1.odd) AND (SubPlan 1)) Rows Removed by Filter: 94 SubPlan 1 -> Seq Scan on tenk1 t3 (actual time=0.011..0.011 rows=1 loops=10000) Filter: (fivethous = t2.fivethous) Rows Removed by Filter: 94 Planning Time: 0.169 ms Execution Time: 235.488 ms (12 rows) patched: explain (costs off, analyze) select * from tenk1 t1 where hundred in (select hundred from tenk2 t2 where t2.odd = t1.odd and even in (select even from tenk1 t3 where t3.fivethous = t2.fivethous)) and even > 0; QUERY PLAN -------------------------------------------------------------------------------------------------- Hash Join (actual time=13.102..17.676 rows=10000 loops=1) Hash Cond: ((t1.odd = t2.odd) AND (t1.hundred = t2.hundred)) -> Seq Scan on tenk1 t1 (actual time=0.014..1.702 rows=10000 loops=1) Filter: (even > 0) -> Hash (actual time=13.080..13.082 rows=100 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> HashAggregate (actual time=13.041..13.060 rows=100 loops=1) Group Key: t2.odd, t2.hundred Batches: 1 Memory Usage: 73kB -> Hash Join (actual time=8.044..11.296 rows=10000 loops=1) Hash Cond: ((t3.fivethous = t2.fivethous) AND (t3.even = t2.even)) -> HashAggregate (actual time=4.054..4.804 rows=5000 loops=1) Group Key: t3.fivethous, t3.even Batches: 1 Memory Usage: 465kB -> Seq Scan on tenk1 t3 (actual time=0.002..0.862 rows=10000 loops=1) -> Hash (actual time=3.962..3.962 rows=10000 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 597kB -> Seq Scan on tenk2 t2 (actual time=0.004..2.289 rows=10000 loops=1) Planning Time: 0.426 ms Execution Time: 18.129 ms (20 rows) The execution time is 33ms (patched) VS 235ms (master). The planning time is 0.426ms (patched) VS 0.169ms (master). I think the extra planning time comes from the search space increasing a lot and that's where the better plan comes. I used below queries to measure how much effort we made but got nothing: run twice in 1 session and just count the second planning time. explain (costs off, analyze) select * from tenk1 t1 where (hundred, odd) in (select hundred, odd from tenk2 t2 where (even, fivethous) in (select even, fivethous from tenk1 t3)); psql regression -f 1.sql | grep 'Planning Time' | tail -1 master: Planning Time: 0.430 ms Planning Time: 0.551 ms Planning Time: 0.316 ms Planning Time: 0.342 ms Planning Time: 0.390 ms patched: Planning Time: 0.405 ms Planning Time: 0.406 ms Planning Time: 0.433 ms Planning Time: 0.371 ms Planning Time: 0.425 ms I think this can show us the extra planning effort is pretty low. This topic has been raised many times, at least at [1] [2]. and even MySQL can support some simple but common cases. I think we can do something helpful as well. Any feedback is welcome. [1] https://www.postgresql.org/message-id/3691.1342650974%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/can_9jtx7n+cxeqlnu_uhxx+escsgxllungazt6sjvdpt7to...@mail.gmail.com -- Best Regards Andy Fan
v1-0001-Pull-up-the-direct-correlated-ANY_SUBLINK-like-EX.patch
Description: Binary data