Hi!
I reviewed your patch and it was interesting for me!
Thank you for the explanation. It was really informative for me!
I think we need the restriction and that should be enough for this feature
. Given the query Richard provided before:
explain
select * from tenk1 A where exists
(select 1 from tenk2 B
where A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd));
It first can be converted to the below format without any issue.
SELECT * FROM tenk1 A SEMI JOIN tenk2 B
on A.hundred in (select C.hundred FROM tenk2 C
WHERE c.odd = b.odd);
Then without the restriction, since we only pull the varnos from
sublink->testexpr, then it is {A}, so it convert to
SELECT * FROM
(tenk1 A SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C)
ON c.odd = b.odd AND a.hundred = v.hundred)
SEMI JOIN on tenk2 B ON TRUE;
then the above query is NOT A VALID QUERY since:
1. The above query is *not* same as
SELECT * FROM (tenk1 A SEMI JOIN tenk2 B) on true
SEMI JOIN LATERAL (SELECT c.hundred FROM tenk2 C) v
ON v.odd = b.odd;
2. The above query requires b.odd when B is not available. So it is
right that an optimizer can't generate a plan for it. The fix would
be to do the restriction before applying this optimization.
I'm not sure pull-up-subquery can play any role here, IIUC, the bad thing
happens before pull-up-subquery.
I also write & analyze more test and found no issue by me
1. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up to rarg of the left join since A.hundred is not
available.
2. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
==> should not be pull-up to rarg of the left join since A.odd is not
available.
3. SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should be pull-up to rarg of left join.
4. SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> pull-up as expected.
5. SELECT * FROM tenk1 A RIGHT JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
==> should not be pull-up into larg of left join since b.odd is not
available.
After reviewing, I want to suggest some changes related to the code and
tests.
First of all, I think, it would be better to "treat" change to
"consider" and rewrite the pull-up check condition in two lines:
/*
* If the sub-select refers to any Vars of the parent query, we so let's
* considering it as LATERAL. (Vars of higher levels don't matter here.)
*/
use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
bms_is_subset(sub_ref_outer_relids, available_rels);
if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
return NULL;
Secondly, I noticed another interesting feature in your patch and I
think it could be added to the test.
If we get only one row from the aggregated subquery, we can pull-up it
in the subquery scan filter.
postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
QUERY PLAN
--------------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on tenk1 a
-> Materialize
-> Nested Loop
-> Seq Scan on tenk2 b
*-> Subquery Scan on "ANY_subquery"
Filter: (b.hundred = "ANY_subquery".min)*
-> Aggregate
-> Seq Scan on tenk2 c
Filter: (odd = b.odd)
(10 rows)
It was impossible without your patch:
postgres=# explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
QUERY PLAN
---------------------------------------------------
Nested Loop Left Join
-> Seq Scan on tenk1 a
-> Materialize
-> Seq Scan on tenk2 b
Filter: (SubPlan 1)
SubPlan 1
-> Aggregate
-> Seq Scan on tenk2 c
Filter: (odd = b.odd)
(9 rows)
And I found an alternative query, when aggregated sublink will pull-up
into JoinExpr condition.
explain (costs off)
SELECT * FROM tenk1 A LEFT JOIN tenk2 B
ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
QUERY PLAN
-------------------------------------------------------------
Nested Loop Left Join
-> Seq Scan on tenk1 a
-> Materialize
-> Hash Semi Join
*Hash Cond: (b.hundred = "ANY_subquery".count)*
-> Seq Scan on tenk2 b
-> Hash
-> Subquery Scan on "ANY_subquery"
-> HashAggregate
Group Key: c.odd
-> Seq Scan on tenk2 c
(11 rows)
Unfortunately, I found a request when sublink did not pull-up, as in the
examples above. I couldn't quite figure out why.
create table a (x int, y int, z int, t int);
create table b (x int, t int);
create unique index on a (t, x);
create index on b (t,x);
insert into a select id, id, id, id FROM generate_series(1,100000) As id;
insert into b select id, id FROM generate_series(1,1000) As id;
explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
left join a
on b.x=a.x and
*b.t in
(select max(a0.t) *
from a a0
where a0.x = b.x and
a0.t = b.t);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash Right Join (actual time=1.150..58.512 rows=1000 loops=1)
Hash Cond: (a.x = b.x)
*Join Filter: (SubPlan 2)*
Buffers: shared hit=3546
-> Seq Scan on a (actual time=0.023..15.798 rows=100000 loops=1)
Buffers: shared hit=541
-> Hash (actual time=1.038..1.042 rows=1000 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=5
-> Seq Scan on b (actual time=0.047..0.399 rows=1000 loops=1)
Buffers: shared hit=5
SubPlan 2
-> Result (actual time=0.018..0.018 rows=1 loops=1000)
Buffers: shared hit=3000
InitPlan 1 (returns $2)
-> Limit (actual time=0.015..0.016 rows=1 loops=1000)
Buffers: shared hit=3000
-> Index Only Scan using a_t_x_idx on a a0 (actual
time=0.014..0.014 rows=1 loops=1000)
Index Cond: ((t IS NOT NULL) AND (t = b.t) AND
(x = b.x))
Heap Fetches: 1000
Buffers: shared hit=3000
Planning Time: 0.630 ms
Execution Time: 58.941 ms
(23 rows)
I thought it would be:
explain (analyze, costs off, buffers)
select b.x, b.x, a.y
from b
left join a on
b.x=a.x and
*b.t =
(select max(a0.t) *
from a a0
where a0.x = b.x and
a0.t <= b.t);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Hash Right Join (actual time=1.181..67.927 rows=1000 loops=1)
Hash Cond: (a.x = b.x)
*Join Filter: (b.t = (SubPlan 2))*
Buffers: shared hit=3546
-> Seq Scan on a (actual time=0.022..17.109 rows=100000 loops=1)
Buffers: shared hit=541
-> Hash (actual time=1.065..1.068 rows=1000 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 72kB
Buffers: shared hit=5
-> Seq Scan on b (actual time=0.049..0.401 rows=1000 loops=1)
Buffers: shared hit=5
SubPlan 2
-> Result (actual time=0.025..0.025 rows=1 loops=1000)
Buffers: shared hit=3000
InitPlan 1 (returns $2)
-> Limit (actual time=0.024..0.024 rows=1 loops=1000)
Buffers: shared hit=3000
-> Index Only Scan Backward using a_t_x_idx on a a0
(actual time=0.023..0.023 rows=1 loops=1000)
Index Cond: ((t IS NOT NULL) AND (t <= b.t)
AND (x = b.x))
Heap Fetches: 1000
Buffers: shared hit=3000
Planning Time: 0.689 ms
Execution Time: 68.220 ms
(23 rows)
If you noticed, it became possible after replacing the "in" operator
with "=".
I took the liberty of adding this to your patch and added myself as
reviewer, if you don't mind.
--
Regards,
Alena Rybakina
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 6af4a3183ac..f46ec3d1826 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1284,18 +1284,16 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
Assert(sublink->subLinkType == ANY_SUBLINK);
/*
- * If the sub-select refers to any Vars of the parent query, we have to
- * treat it as LATERAL. (Vars of higher levels don't matter here.)
+ * If the sub-select refers to any Vars of the parent query, we so let's
+ * considering it as LATERAL. (Vars of higher levels don't matter here.)
*/
sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
- if (!bms_is_empty(sub_ref_outer_relids))
- {
- if (bms_is_subset(sub_ref_outer_relids, available_rels))
- use_lateral = true;
- else
- return NULL;
- }
+ use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
+ bms_is_subset(sub_ref_outer_relids, available_rels);
+
+ if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
+ return NULL;
/*
* The test expression must contain some Vars of the parent query, else
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index 17df6b5dc9c..b70b346696d 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -2028,3 +2028,46 @@ ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
-> Seq Scan on tenk2 b
(11 rows)
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Nested Loop
+ -> Seq Scan on tenk2 b
+ -> Memoize
+ Cache Key: b.hundred, b.odd
+ Cache Mode: binary
+ -> Subquery Scan on "ANY_subquery"
+ Filter: (b.hundred = "ANY_subquery".min)
+ -> Result
+ InitPlan 1 (returns $1)
+ -> Limit
+ -> Index Scan using tenk2_hundred on tenk2 c
+ Index Cond: (hundred IS NOT NULL)
+ Filter: (odd = b.odd)
+(16 rows)
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+ QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Hash Semi Join
+ Hash Cond: (b.hundred = "ANY_subquery".count)
+ -> Seq Scan on tenk2 b
+ -> Hash
+ -> Subquery Scan on "ANY_subquery"
+ -> HashAggregate
+ Group Key: c.odd
+ -> Seq Scan on tenk2 c
+(11 rows)
+
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index 49ce5fc99a8..5d33eb39baa 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -1000,3 +1000,13 @@ ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
explain (costs off)
SELECT * FROM tenk1 A INNER JOIN tenk2 B
ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file
From 339c6a0ffd1f80e23bd2924e462568142e8a6080 Mon Sep 17 00:00:00 2001
From: Andy Fan <zhihui.fan1...@gmail.com>
Date: Wed, 11 Oct 2023 23:41:12 +0300
Subject: [PATCH] Pull up direct-correlated ANY_SUBLINK using lateral join.
Reviewed-by: Alena Rybakina <lena.riback...@yandex.ru>
---
.../postgres_fdw/expected/postgres_fdw.out | 6 +-
contrib/postgres_fdw/sql/postgres_fdw.sql | 4 +-
src/backend/optimizer/plan/subselect.c | 15 +-
src/test/regress/expected/join.out | 14 +-
src/test/regress/expected/subselect.out | 145 ++++++++++++++++++
src/test/regress/sql/join.sql | 8 +-
src/test/regress/sql/subselect.sql | 42 +++++
7 files changed, 214 insertions(+), 20 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 144c114d0fe..932a18624a8 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11633,7 +11633,7 @@ CREATE FOREIGN TABLE foreign_tbl (b int)
CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
SERVER loopback OPTIONS (table_name 'base_tbl');
EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
QUERY PLAN
-----------------------------------------------------------------------------
Seq Scan on public.base_tbl
@@ -11641,7 +11641,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
Filter: (SubPlan 1)
SubPlan 1
-> Result
- Output: base_tbl.a
+ Output: base_tbl.a, (random() > '0'::double precision)
-> Append
-> Async Foreign Scan on public.foreign_tbl foreign_tbl_1
Remote SQL: SELECT NULL FROM public.base_tbl
@@ -11649,7 +11649,7 @@ SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
Remote SQL: SELECT NULL FROM public.base_tbl
(11 rows)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
a
---
1
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index a303bfb322d..033b5b815da 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3854,8 +3854,8 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl)
SERVER loopback OPTIONS (table_name 'base_tbl');
EXPLAIN (VERBOSE, COSTS OFF)
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
-SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
+SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM foreign_tbl);
-- Clean up
DROP FOREIGN TABLE foreign_tbl CASCADE;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index 7a9fe88fec3..f46ec3d1826 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -1278,14 +1278,21 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
List *subquery_vars;
Node *quals;
ParseState *pstate;
+ Relids sub_ref_outer_relids = NULL;
+ bool use_lateral = false;
Assert(sublink->subLinkType == ANY_SUBLINK);
/*
- * The sub-select must not refer to any Vars of the parent query. (Vars of
- * higher levels should be okay, though.)
+ * If the sub-select refers to any Vars of the parent query, we so let's
+ * considering it as LATERAL. (Vars of higher levels don't matter here.)
*/
- if (contain_vars_of_level((Node *) subselect, 1))
+ sub_ref_outer_relids = pull_varnos_of_level(NULL, (Node *) subselect, 1);
+
+ use_lateral = !bms_is_empty(sub_ref_outer_relids) &&
+ bms_is_subset(sub_ref_outer_relids, available_rels);
+
+ if (!use_lateral && !bms_is_empty(sub_ref_outer_relids))
return NULL;
/*
@@ -1323,7 +1330,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
nsitem = addRangeTableEntryForSubquery(pstate,
subselect,
makeAlias("ANY_subquery", NIL),
- false,
+ use_lateral,
false);
rte = nsitem->p_rte;
parse->rtable = lappend(parse->rtable, rte);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index b95d30f6586..156aca128ab 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5251,7 +5251,7 @@ reset enable_nestloop;
explain (costs off)
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
QUERY PLAN
----------------------------------------------------------
Hash Join
@@ -5267,7 +5267,7 @@ select a.unique1, b.unique2
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
unique1 | unique2
---------+---------
123 | 123
@@ -7173,12 +7173,12 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
- QUERY PLAN
-----------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------
Nested Loop
Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2
-> Seq Scan on public.int8_tbl t1
@@ -7195,7 +7195,7 @@ lateral (select * from int8_tbl t1,
Filter: (SubPlan 3)
SubPlan 3
-> Result
- Output: t3.q2
+ Output: t3.q2, (random() > '0'::double precision)
One-Time Filter: $4
InitPlan 1 (returns $2)
-> Result
@@ -7212,7 +7212,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/expected/subselect.out b/src/test/regress/expected/subselect.out
index a3a4d03d104..b70b346696d 100644
--- a/src/test/regress/expected/subselect.out
+++ b/src/test/regress/expected/subselect.out
@@ -1926,3 +1926,148 @@ select * from x for update;
Output: subselect_tbl.f1, subselect_tbl.f2, subselect_tbl.f3
(2 rows)
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+ QUERY PLAN
+------------------------------------------------------------
+ Hash Join
+ Hash Cond: ((a.odd = b.odd) AND (a.hundred = b.hundred))
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> HashAggregate
+ Group Key: b.odd, b.hundred
+ -> Seq Scan on tenk2 b
+(7 rows)
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+ QUERY PLAN
+---------------------------------
+ Nested Loop Semi Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+---------------------------------
+ Nested Loop Left Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = b.odd)
+(8 rows)
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+ QUERY PLAN
+---------------------------------
+ Nested Loop Left Join
+ Join Filter: (SubPlan 1)
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Seq Scan on tenk2 b
+ SubPlan 1
+ -> Seq Scan on tenk2 c
+ Filter: (odd = a.odd)
+(8 rows)
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+------------------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Hash Join
+ Hash Cond: ((b.odd = c.odd) AND (b.hundred = c.hundred))
+ -> Seq Scan on tenk2 b
+ -> Hash
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+(10 rows)
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+-------------------------------------------------
+ Hash Join
+ Hash Cond: (c.odd = b.odd)
+ -> Hash Join
+ Hash Cond: (a.hundred = c.hundred)
+ -> Seq Scan on tenk1 a
+ -> Hash
+ -> HashAggregate
+ Group Key: c.odd, c.hundred
+ -> Seq Scan on tenk2 c
+ -> Hash
+ -> Seq Scan on tenk2 b
+(11 rows)
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Nested Loop
+ -> Seq Scan on tenk2 b
+ -> Memoize
+ Cache Key: b.hundred, b.odd
+ Cache Mode: binary
+ -> Subquery Scan on "ANY_subquery"
+ Filter: (b.hundred = "ANY_subquery".min)
+ -> Result
+ InitPlan 1 (returns $1)
+ -> Limit
+ -> Index Scan using tenk2_hundred on tenk2 c
+ Index Cond: (hundred IS NOT NULL)
+ Filter: (odd = b.odd)
+(16 rows)
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
+ QUERY PLAN
+-------------------------------------------------------------
+ Nested Loop Left Join
+ -> Seq Scan on tenk1 a
+ -> Materialize
+ -> Hash Semi Join
+ Hash Cond: (b.hundred = "ANY_subquery".count)
+ -> Seq Scan on tenk2 b
+ -> Hash
+ -> Subquery Scan on "ANY_subquery"
+ -> HashAggregate
+ Group Key: c.odd
+ -> Seq Scan on tenk2 c
+(11 rows)
+
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 3e5032b04dd..2121caf7e3e 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1852,11 +1852,11 @@ reset enable_nestloop;
explain (costs off)
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
select a.unique1, b.unique2
from onek a left join onek b on a.unique1 = b.unique2
- where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1);
+ where (b.unique2, random() > 0) = any (select q1, random() > 0 from int8_tbl c where c.q1 < b.unique1);
--
-- test full-join strength reduction
@@ -2578,7 +2578,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
@@ -2587,7 +2587,7 @@ select * from (values (0), (1)) v(id),
lateral (select * from int8_tbl t1,
lateral (select * from
(select * from int8_tbl t2
- where q1 = any (select q2 from int8_tbl t3
+ where (q1, random() > 0) = any (select q2, random() > 0 from int8_tbl t3
where q2 = (select greatest(t1.q1,t2.q2))
and (select v.id=0)) offset 0) ss2) ss
where t1.q1 = ss.q2) ss0;
diff --git a/src/test/regress/sql/subselect.sql b/src/test/regress/sql/subselect.sql
index bd2e39efd48..5d33eb39baa 100644
--- a/src/test/regress/sql/subselect.sql
+++ b/src/test/regress/sql/subselect.sql
@@ -968,3 +968,45 @@ select * from (with x as (select 2 as y) select * from x) ss;
explain (verbose, costs off)
with x as (select * from subselect_tbl)
select * from x for update;
+
+-- Pull-up the direct-correlated ANY_SUBLINK
+explain (costs off)
+select * from tenk1 A where hundred in (select hundred from tenk2 B where B.odd = A.odd);
+
+explain (costs off)
+select * from tenk1 A where exists
+(select 1 from tenk2 B
+where A.hundred in (select C.hundred FROM tenk2 C
+WHERE c.odd = b.odd));
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.hundred is not avaiable.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we should only try to pull up the sublink into RHS of a left join
+-- but a.odd is not avaiable for this.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = a.odd);
+
+-- should be able to pull up since all the references is available
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the sublink into the inner JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A INNER JOIN tenk2 B
+ON A.hundred in (SELECT c.hundred FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the subquery scan because of got one row.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT min(c.hundred) FROM tenk2 C WHERE c.odd = b.odd);
+
+-- we can pull up the aggregate sublink into the JoinExpr.
+explain (costs off)
+SELECT * FROM tenk1 A LEFT JOIN tenk2 B
+ON B.hundred in (SELECT count(c.hundred) FROM tenk2 C group by (c.odd));
\ No newline at end of file
--
2.34.1