Re: Memoize ANTI and SEMI JOIN inner

Alena Rybakina Sun, 30 Mar 2025 23:35:23 -0700

On 31.03.2025 06:33, David Rowley wrote:

On Mon, 31 Mar 2025 at 16:21, Alena Rybakina <a.rybak...@postgrespro.ru> wrote:

However, is it necessary to check that extra->inner_unique must be false for 
SEMI/ANTI joins here, or am I missing something? It looks a little confusing at 
this point.

If it is necessary, I don't see the reason for it. It was me that
worked on unique joins and I see no reason why a SEMI or ANTI join
couldn't be marked as unique. The reason they're not today is that the
only point of the unique join optimisation is so that during
execution, the join nodes could skip to the next outer tuple after
matching the current outer to an inner.  If the join is unique, then
there are no more join partners to find for the current outer after
matching it up once. With SEMI and ANTI joins, we skip to the next
outer tuple after finding a match anyway, so there's no point in going
to the trouble of setting the inner_unique flag.


I can't say definitively that we won't find a reason in the future
that we should set inner_unique for SEMI/ANTI joins, so I don't follow
the need for the Assert.

Maybe you're seeing something that I'm not. What do you think will
break if both flags are true?

Actually, I was mainly confused by the code itself - the check seemed tocontradict the explanation. It looked like we were enforcing thatinner_unique must be false for SEMI/ANTI joins, even though it's notactually important for those join types.That’s why I originally proposed either adding an Assert or removingthis flag from check altogether, just to make it more explicit.

So, I agree with your explanation — by the definition of SEMI and ANTIjoins, there's no need to set inner_unique, and also no need to assertagainst it.These joins skip to the next outer tuple once they find a match (or failto find one, in the case of ANTI).


I updated the diff, where I left changes only in the code comment.

--
Regards,
Alena Rybakina
Postgres Professional

diff --git a/src/backend/optimizer/path/joinpath.c 
b/src/backend/optimizer/path/joinpath.c
index c75408f552b..252cb712943 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -728,14 +728,16 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
                single_mode = true;
 
        /*
-        * Memoize normally marks cache entries as complete when it runs out of
-        * tuples to read from its subplan.  However, with unique joins, Nested
-        * Loop will skip to the next outer tuple after finding the first 
matching
-        * inner tuple. Another case is a semi or anti join. If number of join
-        * clauses, pushed to the inner as parameterised filter no less than the
-        * number of join clauses, that means all the clauses have been pushed 
to
-        * the inner and any tuple coming from the inner side will be 
successfully
-        * used to build the join result.
+        * Normally, memoize marks cache entries as complete when it exhausts
+        * all tuples from its subplan.  However, in unique joins, Nested Loop
+        * will skip to the next outer tuple after finding the first matching
+        * inner tuple.
+        * Another case is a SEMI or ANTI joins. If the number of join clauses,
+        * pushed to the inner as parameterised filter is equal to or greater
+        * than the total number of join clauses. This implies that all relevant
+        * join conditions have been applied on the inner side, so any returned
+        * inner tuple will be guaranteed to satisfy the join condition, making
+        * it safe to memoize.
         * This means that we may not read the inner side of the
         * join to completion which leaves no opportunity to mark the cache 
entry
         * as complete.  To work around that, when the join is unique we

Re: Memoize ANTI and SEMI JOIN inner

Reply via email to