Alexander Korotkov писал(а) 2025-03-24 11:49:
On Mon, Mar 24, 2025 at 9:07 AM Alexander Pyhalov
<a.pyha...@postgrespro.ru> wrote:
Alexander Korotkov писал(а) 2025-03-24 04:21:
> Hi, Alexander!
>
> On Tue, Mar 18, 2025 at 6:04 PM Alexander Pyhalov
> <a.pyha...@postgrespro.ru> wrote:
>> This shouldn't. When semi-join is found below left/right join, it's
>> deparsed as subquery.
>> Interesting enough, this mechanics (deparsing as subquery) is used
>> 1) for semi-joins under left/right join,
>> 2) for full outer joins when inner or outer part has some
>> remote_conds.
>>
>> The issue here is that after subquery is deparsed, we don't consider
>> if
>> its target attributes are available to the upper level
>> join . As for semi-join itself, all conditions are still deparsed on
>> left/right join boundary, they are just not propagated further.
>> This shouldn't be a problem, as they are evaluated in subquery. As for
>> left/right join without semi-join beneath it - its behavior is not
>> affected
>> (as hidden_subquery_rels is empty).
>
> Thank you for the explanation.  But I have another question.  Aren't
> the checks you've proposed too strict?  hidden_subquery_rels are
> propagated all the way to the join tree.  So, pulling conditions would
> be disables all the way to the join tree too.  Is it enough to just
> disable pulling conditions directly from semi-joins, then their
> further pulls will be disabled automatically?  See the attached patch.
> It also contains other (mostly cosmetic improvements).
>
> ------
> Regards,
> Alexander Korotkov
> Supabase

Hi. No, they are not too strict. Look at the following example

EXPLAIN (verbose, costs off)
SELECT x1.c1 FROM
(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE
ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
        RIGHT JOIN
(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE
ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
        ON (x1.c1 = x2.c1)
        LEFT JOIN
                (SELECT * FROM ft2 WHERE c2 < 11) x3
        ON (x1.c1 = x3.c1)
ORDER BY x1.c1 LIMIT 10;

With patch which you suggest, we'll deparse left part of left join as
subquery, but will try to pop c2 < 10 condition from
(8) LEFT JOIN ((6) SEMI JOIN (7)) subquery. When we look at left join of
this subquery and ft2, we still deparse left part as
subquery, so can't pop up conditions from it.

I've checked, this query seems to result in the exactly same remote
SQLs with your and mine patches.  Could you elaborate more on the
difference?  Do you think foreign_join_ok() can give different results
on this query?

Hi.
With your patch this example gives the same
ERROR:  unexpected expression in subquery output

This happens, because we don't keep knowledge that we have deparsed all semi-joins below this left join. As long as left/right join has semi-join in its left or right part, this part will be deparsed as subquery (look at the following lines in foreign_join_ok()):

else if (jointype == JOIN_LEFT || jointype == JOIN_RIGHT || jointype == JOIN_FULL)
        {
                /*
* Conditions, generated from semi-joins, should be evaluated before
                 * LEFT/RIGHT/FULL join.
                 */
                if (!bms_is_empty(fpinfo_o->hidden_subquery_rels))
                {
                        fpinfo->make_outerrel_subquery = true;
fpinfo->lower_subquery_rels = bms_add_members(fpinfo->lower_subquery_rels, outerrel->relids);
                }

                if (!bms_is_empty(fpinfo_i->hidden_subquery_rels))
                {
                        fpinfo->make_innerrel_subquery = true;
fpinfo->lower_subquery_rels = bms_add_members(fpinfo->lower_subquery_rels, innerrel->relids);
                }
        }


So, we still can't refer to its remote_conds from upper level queries (as not all Vars are available from subquery after subquery is created in one part of left or right join). It's not necessary to have semi-join for this as immediate left/right join inner or outer for inner/outer to be deparsed as subquery. But it shouldn't be an issue - we've already used remote_conds when created this subquery. What I'm trying to say - logic of 'making subquery' and extracting conditions should match (or we need more sophisticated way of forming subquery targetlist, so that extracted conditions could be used above subqueries).

--
Best regards,
Alexander Pyhalov,
Postgres Professional
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb4ed3059c4..d1acee5a5fa 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4936,6 +4936,44 @@ SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
 ----+----+----+----+----+----+----+----+----+----+----
 (0 rows)
 
+-- Semi-join conditions shouldn't pop up as left/right join clauses.
+SET enable_material TO off;
+EXPLAIN (verbose, costs off)
+SELECT x1.c1 FROM
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
+	RIGHT JOIN
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
+	ON (x1.c1 = x2.c1)
+ORDER BY x1.c1 LIMIT 10;
+                                                                                                                                                                                 QUERY PLAN                                                                                                                                                                                 
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Foreign Scan
+   Output: ft2.c1
+   Relations: ((public.ft2 ft2_1) LEFT JOIN ((public.ft2) SEMI JOIN (public.ft4))) SEMI JOIN (public.ft4 ft4_1)
+   Remote SQL: SELECT s9.c1 FROM ("S 1"."T 1" r6 LEFT JOIN (SELECT r4."C 1" FROM "S 1"."T 1" r4 WHERE ((r4.c2 < 10)) AND EXISTS (SELECT NULL FROM "S 1"."T 3" r5 WHERE ((r4."C 1" = r5.c1)))) s9(c1) ON (((s9.c1 = r6."C 1")))) WHERE ((r6.c2 < 10)) AND EXISTS (SELECT NULL FROM "S 1"."T 3" r7 WHERE ((r6."C 1" = r7.c1))) ORDER BY s9.c1 ASC NULLS LAST LIMIT 10::bigint
+(4 rows)
+
+SELECT x1.c1 FROM
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
+	RIGHT JOIN
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
+	ON (x1.c1 = x2.c1)
+ORDER BY x1.c1 LIMIT 10;
+ c1 
+----
+  2
+  4
+  6
+  8
+ 10
+ 12
+ 14
+ 16
+ 18
+ 20
+(10 rows)
+
+RESET enable_material;
 -- Can't push down semi-join with inner rel vars in targetlist
 EXPLAIN (verbose, costs off)
 SELECT ft1.c1 FROM ft1 JOIN ft2 on ft1.c1 = ft2.c1 WHERE
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 1131a8bf77e..455ddf101b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -5964,16 +5964,30 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 			break;
 
 		case JOIN_LEFT:
-			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+			/*
+			 * When semi-join is involved in the inner or outer part of the
+			 * left join, it's deparsed as a subquery, and we can't refer to
+			 * its vars on the upper level.
+			 */
+			if (fpinfo_i->jointype != JOIN_SEMI)
+				fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
 											  fpinfo_i->remote_conds);
-			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
+			if (fpinfo_o->jointype != JOIN_SEMI)
+				fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
 											   fpinfo_o->remote_conds);
 			break;
 
 		case JOIN_RIGHT:
-			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
+			/*
+			 * When semi-join is involved in the inner or outer part of the
+			 * right join, it's deparsed as a subquery, and we can't refer to
+			 * its vars on the upper level.
+			 */
+			if (fpinfo_o->jointype != JOIN_SEMI)
+				fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
 											  fpinfo_o->remote_conds);
-			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
+			if (fpinfo_i->jointype != JOIN_SEMI)
+				fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
 											   fpinfo_i->remote_conds);
 			break;
 
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index d45e9f8ab52..c58e8fc9912 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -1455,6 +1455,34 @@ SELECT ft2.*, ft4.* FROM ft2 INNER JOIN
 	WHERE ft2.c1 > 900
 	ORDER BY ft2.c1 LIMIT 10;
 
+-- Semi-join conditions shouldn't pop up as left/right join clauses.
+SET enable_material TO off;
+EXPLAIN (verbose, costs off)
+SELECT x1.c1 FROM
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
+	RIGHT JOIN
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
+	ON (x1.c1 = x2.c1)
+ORDER BY x1.c1 LIMIT 10;
+SELECT x1.c1 FROM
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
+	RIGHT JOIN
+		(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
+	ON (x1.c1 = x2.c1)
+ORDER BY x1.c1 LIMIT 10;
+
+EXPLAIN (verbose, costs off)
+SELECT x1.c1 FROM
+	(SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x1
+        RIGHT JOIN
+                (SELECT * FROM ft2 WHERE EXISTS (SELECT 1 FROM ft4 WHERE ft4.c1 = ft2.c1 AND ft2.c2 < 10)) x2
+        ON (x1.c1 = x2.c1)
+        LEFT JOIN
+                (SELECT * FROM ft2 WHERE c2 < 11) x3
+        ON (x1.c1 = x3.c1)
+ORDER BY x1.c1 LIMIT 10;
+RESET enable_material;
+
 -- Can't push down semi-join with inner rel vars in targetlist
 EXPLAIN (verbose, costs off)
 SELECT ft1.c1 FROM ft1 JOIN ft2 on ft1.c1 = ft2.c1 WHERE

Reply via email to