To re-familiarize myself with this patch, I've been re-reading the thread, which has gotten quite long. It seemed like it would be a good idea to stop and try to summarize what the patch ought to accomplish, because there's been some drift over the more than 2 years the patch has been in the works.
So: ISTM there are two core ideas at this point. 1. We should recognize when the inner side of a join is unique (that is, there is provably no more than one inner tuple joining to any given outer tuple), and make use of that knowledge to short-circuit execution in the same way we already do for JOIN_SEMI and JOIN_ANTI cases. That is, once we find a match we can immediately move on to the next outer tuple rather than continuing to scan for inner-side matches. 2. In these same cases (unique/semi/anti joins), it is possible to avoid mark/restore overhead in a mergejoin, because we can tweak the executor logic to not require backing up the inner side. This goes further than just tweaking the executor logic, though, because if we know we don't need mark/restore then that actually makes some plan shapes legal that weren't before: we don't need to interpose a Material node to protect join inputs that can't mark/restore. Maybe I missed something, but it doesn't look like the current patch (unique_joins_2017-01-27_no_outer_unique.patch) has anything concerning point #2 at all. It might make sense to address that idea as a follow-on patch, but I think it can be a quite significant win and we shouldn't just lose track of it. Anyway, having laid out that scope of work, I have some concerns: * The patch applies point #1 to only INNER and LEFT join modes, but I don't really see why the idea wouldn't work for RIGHT and FULL modes, ie the optimization seems potentially interesting for all executable join types. Once you've got a match, you can immediately go to the next outer tuple instead of continuing to scan inner. (Am I missing something?) * Particularly in view of the preceding point, I'm not that happy with the way that management/caching of the "is it unique" knowledge is done completely differently for INNER and LEFT joins. I wonder if there's actually a good argument for that or is it mostly a development sequence artifact. IOW, would it hurt to drop the SpecialJoinInfo tie-in and just rely on the generic cache? * Because of where we apply the short-circuit logic in the executor, it's only safe to consider the inner rel as unique if it is provably unique using only the join clauses that drive the primary join mechanism (ie, the "joinquals" not the "otherquals"). We already do ignore quals that are pushed-down to an outer join, so that's good, but there's an oversight: we will use a qual that is mergejoinable even if it's not hashjoinable. That means we could get the wrong answers in a hash join. I think probably the appropriate fix for the moment is just to consider only clauses that are both mergeable and hashable while trying to prove uniqueness. We do have some equality operators that support only one or the other, but they're corner cases, and I'm dubious that it's worth having to make separate proofs for merge and hash joins in order to cater to those cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers