Andy Fan <zhihui.fan1...@gmail.com> writes: > In the past we pull-up the ANY-sublink with 2 steps, the first step is to > pull up the sublink as a subquery, and the next step is to pull up the > subquery if it is allowed. The benefits of this method are obvious, > pulling up the subquery has more requirements, even if we can just finish > the first step, we still get huge benefits. However the bad stuff happens > if varlevelsup = 1 involves, things fail at step 1.
> convert_ANY_sublink_to_join ... > if (contain_vars_of_level((Node *) subselect, 1)) > return NULL; > In this patch we distinguish the above case and try to pull-up it within > one step if it is helpful, It looks to me that what we need to do is just > transform it to EXIST-SUBLINK. This patch seems awfully messy to me. The fact that you're having to duplicate stuff done elsewhere suggests at the least that you've not plugged the code into the best place. Looking again at that contain_vars_of_level restriction, I think the reason for it was just to avoid making a FROM subquery that has outer references, and the reason we needed to avoid that was merely that we didn't have LATERAL at the time. So I experimented with the attached. It seems to work, in that we don't get wrong answers from any of the small number of places that are affected. (I wonder though whether those test cases still test what they were intended to, particularly the postgres_fdw one. We might have to try to hack them some more to not get affected by this optimization.) Could do with more test cases, no doubt. One thing I'm not at all clear about is whether we need to restrict the optimization so that it doesn't occur if the subquery contains outer references falling outside available_rels. I think that that case is covered by is_simple_subquery() deciding later to not pull up the subquery based on LATERAL restrictions, but maybe that misses something. I'm also wondering whether the similar restriction in convert_EXISTS_sublink_to_join could be removed similarly. In this light it was a mistake for convert_EXISTS_sublink_to_join to manage the pullup itself rather than doing it in the two-step fashion that convert_ANY_sublink_to_join does it. regards, tom lane
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 558e94b845..c07280d836 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -11377,19 +11377,19 @@ CREATE FOREIGN TABLE foreign_tbl2 () INHERITS (foreign_tbl) SERVER loopback OPTIONS (table_name 'base_tbl'); EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl); - QUERY PLAN ------------------------------------------------------------------------------ - Seq Scan on public.base_tbl + QUERY PLAN +--------------------------------------------------------------------------- + Nested Loop Semi Join Output: base_tbl.a - Filter: (SubPlan 1) - SubPlan 1 - -> Result - Output: base_tbl.a - -> Append - -> Async Foreign Scan on public.foreign_tbl foreign_tbl_1 - Remote SQL: SELECT NULL FROM public.base_tbl - -> Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2 - Remote SQL: SELECT NULL FROM public.base_tbl + -> Seq Scan on public.base_tbl + Output: base_tbl.a, base_tbl.b + Filter: (base_tbl.a IS NOT NULL) + -> Materialize + -> Append + -> Async Foreign Scan on public.foreign_tbl foreign_tbl_1 + Remote SQL: SELECT NULL FROM public.base_tbl + -> Async Foreign Scan on public.foreign_tbl2 foreign_tbl_2 + Remote SQL: SELECT NULL FROM public.base_tbl (11 rows) SELECT a FROM base_tbl WHERE a IN (SELECT a FROM foreign_tbl); diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c index 92e3338584..3d4645a154 100644 --- a/src/backend/optimizer/plan/subselect.c +++ b/src/backend/optimizer/plan/subselect.c @@ -1271,6 +1271,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, JoinExpr *result; Query *parse = root->parse; Query *subselect = (Query *) sublink->subselect; + bool has_level_1_vars; Relids upper_varnos; int rtindex; ParseNamespaceItem *nsitem; @@ -1283,11 +1284,10 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, Assert(sublink->subLinkType == ANY_SUBLINK); /* - * The sub-select must not refer to any Vars of the parent query. (Vars of - * higher levels should be okay, though.) + * If the sub-select refers to any Vars of the parent query, we have to + * treat it as LATERAL. (Vars of higher levels don't matter here.) */ - if (contain_vars_of_level((Node *) subselect, 1)) - return NULL; + has_level_1_vars = contain_vars_of_level((Node *) subselect, 1); /* * The test expression must contain some Vars of the parent query, else @@ -1324,7 +1324,7 @@ convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink, nsitem = addRangeTableEntryForSubquery(pstate, subselect, makeAlias("ANY_subquery", NIL), - false, + has_level_1_vars, false); rte = nsitem->p_rte; parse->rtable = lappend(parse->rtable, rte); diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 9358371072..cdaedb92b2 100644 --- a/src/test/regress/expected/join.out +++ b/src/test/regress/expected/join.out @@ -4657,17 +4657,17 @@ explain (costs off) select a.unique1, b.unique2 from onek a left join onek b on a.unique1 = b.unique2 where b.unique2 = any (select q1 from int8_tbl c where c.q1 < b.unique1); - QUERY PLAN ----------------------------------------------------------- - Hash Join - Hash Cond: (b.unique2 = a.unique1) - -> Seq Scan on onek b - Filter: (SubPlan 1) - SubPlan 1 - -> Seq Scan on int8_tbl c - Filter: (q1 < b.unique1) - -> Hash - -> Index Only Scan using onek_unique1 on onek a + QUERY PLAN +---------------------------------------------------- + Nested Loop + -> Hash Semi Join + Hash Cond: (b.unique2 = c.q1) + Join Filter: (c.q1 < b.unique1) + -> Seq Scan on onek b + -> Hash + -> Seq Scan on int8_tbl c + -> Index Only Scan using onek_unique1 on onek a + Index Cond: (unique1 = b.unique2) (9 rows) select a.unique1, b.unique2 @@ -6074,8 +6074,8 @@ lateral (select * from int8_tbl t1, where q2 = (select greatest(t1.q1,t2.q2)) and (select v.id=0)) offset 0) ss2) ss where t1.q1 = ss.q2) ss0; - QUERY PLAN ------------------------------------------------------------------ + QUERY PLAN +--------------------------------------------------------------------- Nested Loop Output: "*VALUES*".column1, t1.q1, t1.q2, ss2.q1, ss2.q2 -> Seq Scan on public.int8_tbl t1 @@ -6087,23 +6087,24 @@ lateral (select * from int8_tbl t1, -> Subquery Scan on ss2 Output: ss2.q1, ss2.q2 Filter: (t1.q1 = ss2.q2) - -> Seq Scan on public.int8_tbl t2 + -> Result Output: t2.q1, t2.q2 - Filter: (SubPlan 3) - SubPlan 3 + One-Time Filter: $3 + InitPlan 2 (returns $3) -> Result - Output: t3.q2 - One-Time Filter: $4 - InitPlan 1 (returns $2) - -> Result - Output: GREATEST($0, t2.q2) - InitPlan 2 (returns $4) - -> Result - Output: ($3 = 0) - -> Seq Scan on public.int8_tbl t3 - Output: t3.q1, t3.q2 - Filter: (t3.q2 = $2) -(27 rows) + Output: ($2 = 0) + -> Nested Loop Semi Join + Output: t2.q1, t2.q2 + Join Filter: (t2.q1 = t3.q2) + -> Seq Scan on public.int8_tbl t2 + Output: t2.q1, t2.q2 + Filter: ((SubPlan 1) = t2.q1) + SubPlan 1 + -> Result + Output: GREATEST($0, t2.q2) + -> Seq Scan on public.int8_tbl t3 + Output: t3.q1, t3.q2 +(28 rows) select * from (values (0), (1)) v(id), lateral (select * from int8_tbl t1,