On 31.03.2025 06:33, David Rowley wrote:
On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybak...@postgrespro.ru> wrote:
However, is it necessary to check that extra->inner_unique must be false for
SEMI/ANTI joins here, or am I missing something? It looks a little confusing at
this point.
If it is necessary, I don't see the reason for it. It was me that
worked on unique joins and I see no reason why a SEMI or ANTI join
couldn't be marked as unique. The reason they're not today is that the
only point of the unique join optimisation is so that during
execution, the join nodes could skip to the next outer tuple after
matching the current outer to an inner. If the join is unique, then
there are no more join partners to find for the current outer after
matching it up once. With SEMI and ANTI joins, we skip to the next
outer tuple after finding a match anyway, so there's no point in going
to the trouble of setting the inner_unique flag.
I can't say definitively that we won't find a reason in the future
that we should set inner_unique for SEMI/ANTI joins, so I don't follow
the need for the Assert.
Maybe you're seeing something that I'm not. What do you think will
break if both flags are true?
Actually, I was mainly confused by the code itself - the check seemed to
contradict the explanation. It looked like we were enforcing that
inner_unique must be false for SEMI/ANTI joins, even though it's not
actually important for those join types.
That’s why I originally proposed either adding an Assert or removing
this flag from check altogether, just to make it more explicit.
So, I agree with your explanation — by the definition of SEMI and ANTI
joins, there's no need to set inner_unique, and also no need to assert
against it.
These joins skip to the next outer tuple once they find a match (or fail
to find one, in the case of ANTI).
I updated the diff, where I left changes only in the code comment.
--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/joinpath.c
b/src/backend/optimizer/path/joinpath.c
index c75408f552b..252cb712943 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -728,14 +728,16 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
single_mode = true;
/*
- * Memoize normally marks cache entries as complete when it runs out of
- * tuples to read from its subplan. However, with unique joins, Nested
- * Loop will skip to the next outer tuple after finding the first
matching
- * inner tuple. Another case is a semi or anti join. If number of join
- * clauses, pushed to the inner as parameterised filter no less than the
- * number of join clauses, that means all the clauses have been pushed
to
- * the inner and any tuple coming from the inner side will be
successfully
- * used to build the join result.
+ * Normally, memoize marks cache entries as complete when it exhausts
+ * all tuples from its subplan. However, in unique joins, Nested Loop
+ * will skip to the next outer tuple after finding the first matching
+ * inner tuple.
+ * Another case is a SEMI or ANTI joins. If the number of join clauses,
+ * pushed to the inner as parameterised filter is equal to or greater
+ * than the total number of join clauses. This implies that all relevant
+ * join conditions have been applied on the inner side, so any returned
+ * inner tuple will be guaranteed to satisfy the join condition, making
+ * it safe to memoize.
* This means that we may not read the inner side of the
* join to completion which leaves no opportunity to mark the cache
entry
* as complete. To work around that, when the join is unique we