On Thu, Mar 20, 2025 at 3:02 PM David Rowley <dgrowle...@gmail.com> wrote: > For making this work, I think the attached should be about the guts of > the code changes. I didn't look at the comments. Right now I can't > think of any reason why this can't be done, but some experimentation > might reveal some reason that it can't.
I conducted some experiments, and I'm afraid it's not safe to consider Memoize for semi or anti joins, unless the inner side is provably unique. As an example, please consider: create table t (a int, b boolean); insert into t select i%3, false from generate_series(1,100)i; analyze t; select * from t t1 where t1.a in (select a from t t2 where t2.b in (select t1.b from t t3 where t2.a > 1 offset 0)); ERROR: cache entry already complete With the proposed patch, this query results in an error. The problem is that join clauses from the upper level may be moved to the semi join. For a given outer tuple, the first inner tuple that satisfies the current parameters will mark the cache entry as complete because singlerow is set to true. However, if that inner tuple and the current outer tuple don't satisfy the join clauses, the second inner tuple that satisfies the parameters will complain that the cache entry is already marked as complete. If the inner side is provably unique, there will be no such problem, as there won't be a second matching tuple. OTOH, in this case, the semi join will be reduced to an inner join by reduce_unique_semijoins. Therefore, it doesn't make much sense to prove inner_unique for semi joins in add_paths_to_joinrel. Perhaps we could spend some planner cycles proving inner_unique for anti joins, so that Memoize nodes can be considered for them? Thanks Richard