Hi!
On 21.03.2025 18:56, Andrei Lepikhov wrote:
On 20/3/2025 07:02, David Rowley wrote:
On Thu, 20 Mar 2025 at 06:16, Andrei Lepikhov <lepi...@gmail.com> wrote:
How can we be sure that semi or anti-join needs only one tuple? I think
it would be enough to control the absence of join clauses and
filters in
the join. Unfortunately, we only have such a guarantee in the plan
creation stage (maybe even setrefs.c). So, it seems we need to
invent an
approach like AlternativeSubplan.
I suggest looking at what 9e215378d did. You might be able to also
allow semi and anti-joins providing the cache keys cover the entire
join condition. I think this might be safe as Nested Loop will only
ask its inner subnode for the first match before skipping to the next
outer row and with anti-join, there's no reason to look for additional
rows after the first. Semi-join and unique joins do the same thing in
nodeNestloop.c. To save doing additional checks at run-time, the code
does:
Thank you for the clue! I almost took the wrong direction.
I have attached the new patch, which includes corrected comments for
better clarification of the changes, as well as some additional tests.
I now feel much more confident about this version since I have
resolved that concern.
I reviewed your patch and made a couple of suggestions.
The first change is related to your comment (and the one before it). I
fixed some grammar issues and simplified the wording to make it clearer
and easier to understand.
The second change involves adding an Assert when generating the Memoize
path. Based on the existing comment and the surrounding logic (shown
below),
I believe it's worth asserting that both inner_unique and single_mode
are not true at the same time — just as a safety check.
/*
* We may do this for SEMI or ANTI joins when they need only one tuple from
* the inner side to produce the result. Following if condition checks that
* rule.
*
* XXX Currently we don't attempt to mark SEMI/ANTI joins as inner_unique
* = true. Should we? See add_paths_to_joinrel()
*/
if(!extra->inner_unique&& (jointype== JOIN_SEMI||
jointype== JOIN_ANTI))
single_mode= true;
--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index c75408f552b..252cb712943 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -728,14 +728,16 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
single_mode = true;
/*
- * Memoize normally marks cache entries as complete when it runs out of
- * tuples to read from its subplan. However, with unique joins, Nested
- * Loop will skip to the next outer tuple after finding the first matching
- * inner tuple. Another case is a semi or anti join. If number of join
- * clauses, pushed to the inner as parameterised filter no less than the
- * number of join clauses, that means all the clauses have been pushed to
- * the inner and any tuple coming from the inner side will be successfully
- * used to build the join result.
+ * Normally, memoize marks cache entries as complete when it exhausts
+ * all tuples from its subplan. However, in unique joins, Nested Loop
+ * will skip to the next outer tuple after finding the first matching
+ * inner tuple.
+ * Another case is a SEMI or ANTI joins. If the number of join clauses,
+ * pushed to the inner as parameterised filter is equal to or greater
+ * than the total number of join clauses. This implies that all relevant
+ * join conditions have been applied on the inner side, so any returned
+ * inner tuple will be guaranteed to satisfy the join condition, making
+ * it safe to memoize.
* This means that we may not read the inner side of the
* join to completion which leaves no opportunity to mark the cache entry
* as complete. To work around that, when the join is unique we
@@ -808,6 +810,8 @@ get_memoize_path(PlannerInfo *root, RelOptInfo *innerrel,
&hash_operators,
&binary_mode))
{
+ Assert(!(extra->inner_unique && single_mode));
+
return (Path *) create_memoize_path(root,
innerrel,
inner_path,