Hi Tom: Thanks for your reply! I have self reviewed the below message at 3 different time periods to prevent from too inaccurate replies. It may be more detailed than it really needed, but it probably can show where I am lost.
On Sat, Oct 15, 2022 at 3:27 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > If the pulled-up join doesn't go into the nullable side of the upper > join then you've changed semantics. In this case, it'd amount to reassociating a semijoin that was within the righthand side of another > semijoin to above that other semijoin. I understand your reply as: select * from t1 *left join* t2 on exists (select 1 from t3 where t3.a = t2.a); = select * from t1 *left join* (t2 semi join t3 on t3.a = t2.a) on true; -- go to nullable side != select * from (t1 *left join* t2 on true) semi join t3 on (t3.a = t2.a); -- go to above the JoinExpr I CAN follow the above. And for this case it is controlled by below code: pull_up_sublinks_qual_recurse switch (j->jointype) { case JOIN_INNER: ... case JOIN_LEFT: j->quals = pull_up_sublinks_qual_recurse(root, j->quals, &j->rarg, rightrelids, NULL, NULL); break; ... } and I didn't change this. My question is could we assume A *semijoin* B ON EXISTS (SELECT 1 FROM C on (Pbc)) = (A *semijoin* (B *semijoin* C on (Pbc))) on TRUE. (current master did) = (A *semijoin* B ON true) *semijoin* C on (Pbc) (my current thinking) Note that there is no 'left outer join' at this place. Since there are too many places called pull_up_sublinks_qual_recurse, to make things less confused, I prepared a patch for this one line change to show where exactly I changed (see patch 2); I think this is the first place I lost. The discussion of outer join > reordering in optimizer/README says that that doesn't work, I think you are talking about the graph "Valid OUTER JOIN Optimizations". I can follow until below. " SEMI joins work a little bit differently. A semijoin can be reassociated into or out of the lefthand side of another semijoin, left join, or antijoin, but not into or out of the righthand side. .. " I am unclear why (A semijoin B on (Pab)) semijoin C on (Pbc) != A semijoin (B semijoin C on (Pbc)) on (Pab); Seems both return rows from A which match both semijoin (Pab) and (Pbc). or I misunderstand the above words in the first place? At last, when I checked optimizer/README, it looks like we used a 'nullable side' while it should be 'nonnullable side'? see patch 1 for details. -- Best Regards Andy Fan
From 8327cfcf969704e9db73bc43e8f270b70263e493 Mon Sep 17 00:00:00 2001 From: Andy Fan <zhihui.fan1213@gmail.com> Date: Mon, 17 Oct 2022 08:38:41 +0800 Subject: [PATCH v1 1/2] a typo error? I think it should be nonnullable side rather thanks nullable side. --- src/backend/optimizer/README | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 41c120e0cdf..0e4232f409f 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -217,7 +217,7 @@ RIGHT JOIN is equivalent to LEFT JOIN after switching the two input tables, so the same identities work for right joins. An example of a case that does *not* work is moving an innerjoin into or -out of the nullable side of an outer join: +out of the nonnullable side of an outer join: A leftjoin (B join C on (Pbc)) on (Pab) != (A leftjoin B on (Pab)) join C on (Pbc) -- 2.21.0
From ab226166518936d81029fbaab7c8aba9fe1550ec Mon Sep 17 00:00:00 2001 From: Andy Fan <zhihui.fan1213@gmail.com> Date: Mon, 17 Oct 2022 09:08:37 +0800 Subject: [PATCH v1 2/2] This patch doesn't mean I insist on this at all, it just shows where I'm lost. --- src/backend/optimizer/prep/prepjointree.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c index 41c7066d90a..581ff2db36b 100644 --- a/src/backend/optimizer/prep/prepjointree.c +++ b/src/backend/optimizer/prep/prepjointree.c @@ -551,12 +551,24 @@ pull_up_sublinks_qual_recurse(PlannerInfo *root, Node *node, * joins can get stacked onto either j->larg or j->rarg, * depending on which rels they reference. */ + + /* + * Here the j comes from convert_EXISTS_sublink_to_join, so + * j->jointype must be SEMIJOIN, so no left/right/full outer + * join involved. I prefer to believe that: + * + * A semi join B on exists (select 1 from C on (Pbc)) -- id1 + * = A semi join (B semi join C on (Pbc) ) on true -- id2 current master does. + * = (A semi join B on true) semi join C on (Pbc) -- id3 this patch does. + * + * I think this is the place I lost:( + */ j->quals = pull_up_sublinks_qual_recurse(root, j->quals, - &j->larg, - available_rels1, - &j->rarg, - child_rels); + jtlink1, + bms_union(available_rels1, child_rels), + NULL, + NULL); /* Return NULL representing constant TRUE */ return NULL; } -- 2.21.0