Re: A new strategy for pull-up correlated ANY_SUBLINK

Alena Rybakina Wed, 11 Oct 2023 14:01:57 -0700

Hi!

I reviewed your patch and it was interesting for me!


Thank you for the explanation. It was really informative for me!


I think we need the restriction and that should be enough for this feature
. Given the query Richard provided before:

explain
select * from tenk1 A where exists
(select 1 from tenk2 B
where A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd));

It first can be converted to the below format without any issue.

SELECT * FROM tenk1 A SEMI JOIN tenk2 B
on A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd);

Then without the restriction, since we only pull the varnos from
sublink->testexpr, then it is {A}, so it convert to

SELECT * FROM
(tenk1 A SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C)
ON c.odd = b.odd AND a.hundred = v.hundred)
SEMI JOIN on tenk2 B ON TRUE;

then the above query is NOT A VALID QUERY since:
1. The above query is *not* same as

SELECT * FROM (tenk1 A SEMI JOIN tenk2 B) on true
SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C) v
ON v.odd = b.odd;

2. The above query requires b.odd when B is not available. So it is
right that an optimizer can't generate a plan for it. The fix would
be to do the restriction before applying this optimization.

I'm not sure pull-up-subquery can play any role here, IIUC, the bad thing
happens before pull-up-subquery.

I also write & analyze more test and found no issue by me

1. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up to rarg of the left join since A.hundred is not
available.

2.  SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
==> should not be pull-up to rarg of the left join since A.odd is not
available.

3. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should be pull-up to rarg of left join.

4. SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> pull-up as expected.

5. SELECT * FROM tenk1 A RIGHT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up into larg of left join since b.odd is not
available.

After reviewing, I want to suggest some changes related to the code andtests.

First of all, I think, it would be better to "treat" change to"consider" and rewrite the pull-up check condition in two lines:


/*
 * If the sub-select refers to any Vars of the parent query, we so let's
 * considering it as LATERAL.  (Vars of higher levels don't matter here.)
 */

use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
bms_is_subset(sub_ref_outer_relids, available_rels);

if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
    return NULL;

Secondly, I noticed another interesting feature in your patch and Ithink it could be added to the test.

If we get only one row from the aggregated subquery, we can pull-up itin the subquery scan filter.


postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);

                          QUERY PLAN
--------------------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on tenk1 a
   ->  Materialize
         ->  Nested Loop
               ->  Seq Scan on tenk2 b
*->  Subquery Scan on "ANY_subquery"
                     Filter: (b.hundred = "ANY_subquery".min)*
                     ->  Aggregate
                           ->  Seq Scan on tenk2 c
                                 Filter: (odd = b.odd)
(10 rows)

It was impossible without your patch:

postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
                    QUERY PLAN
---------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on tenk1 a
   ->  Materialize
         ->  Seq Scan on tenk2 b
               Filter: (SubPlan 1)
               SubPlan 1
                 ->  Aggregate
                       ->  Seq Scan on tenk2 c
                             Filter: (odd = b.odd)
(9 rows)

And I found an alternative query, when aggregated sublink will pull-upinto JoinExpr condition.


explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
                         QUERY PLAN
-------------------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on tenk1 a
   ->  Materialize
         ->  Hash Semi Join
*Hash Cond: (b.hundred = "ANY_subquery".count)*
               ->  Seq Scan on tenk2 b
               ->  Hash
                     ->  Subquery Scan on "ANY_subquery"
                           ->  HashAggregate
                                 Group Key: c.odd
                                 ->  Seq Scan on tenk2 c
(11 rows)

Unfortunately, I found a request when sublink did not pull-up, as in theexamples above. I couldn't quite figure out why.


create table a (x int, y int, z int, t int);
create table b (x int, t int);
create unique index on a (t, x);
create index on b (t,x);
insert into a select id, id, id, id FROM generate_series(1,100000) As id;
insert into b select id, id FROM generate_series(1,1000) As id;

explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
    left join a
        on b.x=a.x and
*b.t in
            (select max(a0.t) *
             from a a0
             where a0.x = b.x and
                   a0.t = b.t);

QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Hash Right Join (actual time=1.150..58.512 rows=1000 loops=1)
   Hash Cond: (a.x = b.x)
*Join Filter: (SubPlan 2)*
   Buffers: shared hit=3546
   ->  Seq Scan on a (actual time=0.023..15.798 rows=100000 loops=1)
         Buffers: shared hit=541
   ->  Hash (actual time=1.038..1.042 rows=1000 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 72kB
         Buffers: shared hit=5
         ->  Seq Scan on b (actual time=0.047..0.399 rows=1000 loops=1)
               Buffers: shared hit=5
   SubPlan 2
     ->  Result (actual time=0.018..0.018 rows=1 loops=1000)
           Buffers: shared hit=3000
           InitPlan 1 (returns $2)
             ->  Limit (actual time=0.015..0.016 rows=1 loops=1000)
                   Buffers: shared hit=3000

-> Index Only Scan using a_t_x_idx on a a0 (actualtime=0.014..0.014 rows=1 loops=1000) Index Cond: ((t IS NOT NULL) AND (t = b.t) AND(x = b.x))

                         Heap Fetches: 1000
                         Buffers: shared hit=3000
 Planning Time: 0.630 ms
 Execution Time: 58.941 ms
(23 rows)

I thought it would be:

explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
    left join a on
        b.x=a.x and
*b.t =
            (select max(a0.t) *
             from a a0
             where a0.x = b.x and
                   a0.t <= b.t);

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Hash Right Join (actual time=1.181..67.927 rows=1000 loops=1)
   Hash Cond: (a.x = b.x)
*Join Filter: (b.t = (SubPlan 2))*
   Buffers: shared hit=3546
   ->  Seq Scan on a (actual time=0.022..17.109 rows=100000 loops=1)
         Buffers: shared hit=541
   ->  Hash (actual time=1.065..1.068 rows=1000 loops=1)
         Buckets: 4096  Batches: 1  Memory Usage: 72kB
         Buffers: shared hit=5
         ->  Seq Scan on b (actual time=0.049..0.401 rows=1000 loops=1)
               Buffers: shared hit=5
   SubPlan 2
     ->  Result (actual time=0.025..0.025 rows=1 loops=1000)
           Buffers: shared hit=3000
           InitPlan 1 (returns $2)
             ->  Limit (actual time=0.024..0.024 rows=1 loops=1000)
                   Buffers: shared hit=3000

-> Index Only Scan Backward using a_t_x_idx on a a0(actual time=0.023..0.023 rows=1 loops=1000) Index Cond: ((t IS NOT NULL) AND (t <= b.t)AND (x = b.x))

                         Heap Fetches: 1000
                         Buffers: shared hit=3000
 Planning Time: 0.689 ms
 Execution Time: 68.220 ms
(23 rows)

If you noticed, it became possible after replacing the "in" operatorwith "=".

I took the liberty of adding this to your patch and added myself asreviewer, if you don't mind.



--
Regards,
Alena Rybakina

diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6af4a3183ac..f46ec3d1826 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1284,18 +1284,16 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	Assert(sublink->subLinkType == ANY_SUBLINK);
 
 	/*
-	 * If the sub-select refers to any Vars of the parent query, we have to
-	 * treat it as LATERAL.  (Vars of higher levels don't matter here.)
+	 * If the sub-select refers to any Vars of the parent query, we so let's
+	 * considering it as LATERAL.  (Vars of higher levels don't matter here.)
 	 */
 	sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
 
-	if (!bms_is_empty(sub_ref_outer_relids))
-	{
-		if (bms_is_subset(sub_ref_outer_relids, available_rels))
-			use_lateral = true;
-		else
-			return NULL;
-	}
+	use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
+						 bms_is_subset(sub_ref_outer_relids, available_rels);
+
+	if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
+		return NULL;
 
 	/*
 	 * The test expression must contain some Vars of the parent query, else
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 17df6b5dc9c..b70b346696d 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2028,3 +2028,46 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
          ->  Seq Scan on tenk2 b
 (11 rows)
 
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on tenk2 b
+               ->  Memoize
+                     Cache Key: b.hundred, b.odd
+                     Cache Mode: binary
+                     ->  Subquery Scan on "ANY_subquery"
+                           Filter: (b.hundred = "ANY_subquery".min)
+                           ->  Result
+                                 InitPlan 1 (returns $1)
+                                   ->  Limit
+                                         ->  Index Scan using tenk2_hundred on tenk2 c
+                                               Index Cond: (hundred IS NOT NULL)
+                                               Filter: (odd = b.odd)
+(16 rows)
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Hash Semi Join
+               Hash Cond: (b.hundred = "ANY_subquery".count)
+               ->  Seq Scan on tenk2 b
+               ->  Hash
+                     ->  Subquery Scan on "ANY_subquery"
+                           ->  HashAggregate
+                                 Group Key: c.odd
+                                 ->  Seq Scan on tenk2 c
+(11 rows)
+
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 49ce5fc99a8..5d33eb39baa 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1000,3 +1000,13 @@ ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
 explain (costs off)
 SELECT * FROM tenk1 A INNER JOIN tenk2 B
 ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file

From 339c6a0ffd1f80e23bd2924e462568142e8a6080 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihui.fan1...@gmail.com>
Date: Wed, 11 Oct 2023 23:41:12 +0300
Subject: [PATCH] Pull up direct-correlated ANY_SUBLINK using lateral join.
 Reviewed-by: Alena Rybakina <lena.riback...@yandex.ru>

---
 .../postgres_fdw/expected/postgres_fdw.out    |   6 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql     |   4 +-
 src/backend/optimizer/plan/subselect.c        |  15 +-
 src/test/regress/expected/join.out            |  14 +-
 src/test/regress/expected/subselect.out       | 145 ++++++++++++++++++
 src/test/regress/sql/join.sql                 |   8 +-
 src/test/regress/sql/subselect.sql            |  42 +++++
 7 files changed, 214 insertions(+), 20 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0fe..932a18624a8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11633,7 +11633,7 @@ CREATE FOREIGN TABLE foreign_tbl (b int)
 CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
   SERVER loopback OPTIONS (table_name 'base_tbl');
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
                                  QUERY PLAN                                  
 -----------------------------------------------------------------------------
  Seq Scan on public.base_tbl
@@ -11641,7 +11641,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
    Filter: (SubPlan 1)
    SubPlan 1
      ->  Result
-           Output: base_tbl.a
+           Output: base_tbl.a, (random() > '0'::double precision)
            ->  Append
                  ->  Async Foreign Scan on public.foreign_tbl foreign_tbl_1
                        Remote SQL: SELECT NULL FROM public.base_tbl
@@ -11649,7 +11649,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
                        Remote SQL: SELECT NULL FROM public.base_tbl
 (11 rows)
 
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
  a 
 ---
  1
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index a303bfb322d..033b5b815da 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3854,8 +3854,8 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
   SERVER loopback OPTIONS (table_name 'base_tbl');
 
 EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
 
 -- Clean up
 DROP FOREIGN TABLE foreign_tbl CASCADE;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 7a9fe88fec3..f46ec3d1826 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1278,14 +1278,21 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	List	   *subquery_vars;
 	Node	   *quals;
 	ParseState *pstate;
+	Relids sub_ref_outer_relids = NULL;
+	bool use_lateral = false;
 
 	Assert(sublink->subLinkType == ANY_SUBLINK);
 
 	/*
-	 * The sub-select must not refer to any Vars of the parent query. (Vars of
-	 * higher levels should be okay, though.)
+	 * If the sub-select refers to any Vars of the parent query, we so let's
+	 * considering it as LATERAL.  (Vars of higher levels don't matter here.)
 	 */
-	if (contain_vars_of_level((Node *) subselect, 1))
+	sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
+
+	use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
+						 bms_is_subset(sub_ref_outer_relids, available_rels);
+
+	if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
 		return NULL;
 
 	/*
@@ -1323,7 +1330,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
 	nsitem = addRangeTableEntryForSubquery(pstate,
 										   subselect,
 										   makeAlias("ANY_subquery", NIL),
-										   false,
+										   use_lateral,
 										   false);
 	rte = nsitem->p_rte;
 	parse->rtable = lappend(parse->rtable, rte);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..156aca128ab 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5251,7 +5251,7 @@ reset enable_nestloop;
 explain (costs off)
 select a.unique1, b.unique2
   from onek a left join onek b on a.unique1 = b.unique2
-  where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+  where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
                         QUERY PLAN                        
 ----------------------------------------------------------
  Hash Join
@@ -5267,7 +5267,7 @@ select a.unique1, b.unique2
 
 select a.unique1, b.unique2
   from onek a left join onek b on a.unique1 = b.unique2
-  where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+  where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
  unique1 | unique2 
 ---------+---------
      123 |     123
@@ -7173,12 +7173,12 @@ select * from (values (0), (1)) v(id),
 lateral (select * from int8_tbl t1,
          lateral (select * from
                     (select * from int8_tbl t2
-                     where q1 = any (select q2 from int8_tbl t3
+                     where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
                                      where q2 = (select greatest(t1.q1,t2.q2))
                                        and (select v.id=0)) offset 0) ss2) ss
          where t1.q1 = ss.q2) ss0;
-                              QUERY PLAN                              
-----------------------------------------------------------------------
+                                  QUERY PLAN                                   
+-------------------------------------------------------------------------------
  Nested Loop
    Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2
    ->  Seq Scan on public.int8_tbl t1
@@ -7195,7 +7195,7 @@ lateral (select * from int8_tbl t1,
                      Filter: (SubPlan 3)
                      SubPlan 3
                        ->  Result
-                             Output: t3.q2
+                             Output: t3.q2, (random() > '0'::double precision)
                              One-Time Filter: $4
                              InitPlan 1 (returns $2)
                                ->  Result
@@ -7212,7 +7212,7 @@ select * from (values (0), (1)) v(id),
 lateral (select * from int8_tbl t1,
          lateral (select * from
                     (select * from int8_tbl t2
-                     where q1 = any (select q2 from int8_tbl t3
+                     where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
                                      where q2 = (select greatest(t1.q1,t2.q2))
                                        and (select v.id=0)) offset 0) ss2) ss
          where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index a3a4d03d104..b70b346696d 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1926,3 +1926,148 @@ select * from x for update;
    Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3
 (2 rows)
 
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Hash Join
+   Hash Cond: ((a.odd = b.odd) AND (a.hundred = b.hundred))
+   ->  Seq Scan on tenk1 a
+   ->  Hash
+         ->  HashAggregate
+               Group Key: b.odd, b.hundred
+               ->  Seq Scan on tenk2 b
+(7 rows)
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+           QUERY PLAN            
+---------------------------------
+ Nested Loop Semi Join
+   Join Filter: (SubPlan 1)
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Seq Scan on tenk2 b
+   SubPlan 1
+     ->  Seq Scan on tenk2 c
+           Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+           QUERY PLAN            
+---------------------------------
+ Nested Loop Left Join
+   Join Filter: (SubPlan 1)
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Seq Scan on tenk2 b
+   SubPlan 1
+     ->  Seq Scan on tenk2 c
+           Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+           QUERY PLAN            
+---------------------------------
+ Nested Loop Left Join
+   Join Filter: (SubPlan 1)
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Seq Scan on tenk2 b
+   SubPlan 1
+     ->  Seq Scan on tenk2 c
+           Filter: (odd = a.odd)
+(8 rows)
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Hash Join
+               Hash Cond: ((b.odd = c.odd) AND (b.hundred = c.hundred))
+               ->  Seq Scan on tenk2 b
+               ->  Hash
+                     ->  HashAggregate
+                           Group Key: c.odd, c.hundred
+                           ->  Seq Scan on tenk2 c
+(10 rows)
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+                   QUERY PLAN                    
+-------------------------------------------------
+ Hash Join
+   Hash Cond: (c.odd = b.odd)
+   ->  Hash Join
+         Hash Cond: (a.hundred = c.hundred)
+         ->  Seq Scan on tenk1 a
+         ->  Hash
+               ->  HashAggregate
+                     Group Key: c.odd, c.hundred
+                     ->  Seq Scan on tenk2 c
+   ->  Hash
+         ->  Seq Scan on tenk2 b
+(11 rows)
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+                                      QUERY PLAN                                       
+---------------------------------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Nested Loop
+               ->  Seq Scan on tenk2 b
+               ->  Memoize
+                     Cache Key: b.hundred, b.odd
+                     Cache Mode: binary
+                     ->  Subquery Scan on "ANY_subquery"
+                           Filter: (b.hundred = "ANY_subquery".min)
+                           ->  Result
+                                 InitPlan 1 (returns $1)
+                                   ->  Limit
+                                         ->  Index Scan using tenk2_hundred on tenk2 c
+                                               Index Cond: (hundred IS NOT NULL)
+                                               Filter: (odd = b.odd)
+(16 rows)
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+                         QUERY PLAN                          
+-------------------------------------------------------------
+ Nested Loop Left Join
+   ->  Seq Scan on tenk1 a
+   ->  Materialize
+         ->  Hash Semi Join
+               Hash Cond: (b.hundred = "ANY_subquery".count)
+               ->  Seq Scan on tenk2 b
+               ->  Hash
+                     ->  Subquery Scan on "ANY_subquery"
+                           ->  HashAggregate
+                                 Group Key: c.odd
+                                 ->  Seq Scan on tenk2 c
+(11 rows)
+
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..2121caf7e3e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1852,11 +1852,11 @@ reset enable_nestloop;
 explain (costs off)
 select a.unique1, b.unique2
   from onek a left join onek b on a.unique1 = b.unique2
-  where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+  where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
 
 select a.unique1, b.unique2
   from onek a left join onek b on a.unique1 = b.unique2
-  where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+  where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
 
 --
 -- test full-join strength reduction
@@ -2578,7 +2578,7 @@ select * from (values (0), (1)) v(id),
 lateral (select * from int8_tbl t1,
          lateral (select * from
                     (select * from int8_tbl t2
-                     where q1 = any (select q2 from int8_tbl t3
+                     where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
                                      where q2 = (select greatest(t1.q1,t2.q2))
                                        and (select v.id=0)) offset 0) ss2) ss
          where t1.q1 = ss.q2) ss0;
@@ -2587,7 +2587,7 @@ select * from (values (0), (1)) v(id),
 lateral (select * from int8_tbl t1,
          lateral (select * from
                     (select * from int8_tbl t2
-                     where q1 = any (select q2 from int8_tbl t3
+                     where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
                                      where q2 = (select greatest(t1.q1,t2.q2))
                                        and (select v.id=0)) offset 0) ss2) ss
          where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index bd2e39efd48..5d33eb39baa 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -968,3 +968,45 @@ select * from (with x as (select 2 as y) select * from x) ss;
 explain (verbose, costs off)
 with x as (select * from subselect_tbl)
 select * from x for update;
+
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file
-- 
2.34.1

Re: A new strategy for pull-up correlated ANY_SUBLINK

Reply via email to