On 7 April 2017 at 07:26, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm looking through this, and I'm failing to see where it deals with > the problem we discussed last time, namely that you can't apply the > optimization unless all clauses that were used in the uniqueness > proof are included in the join's merge/hash conditions + joinquals.
Many thanks for looking at this again. The test in question is in nodeMergeJoin.c. I believe the join is still unique no matter where the clauses are evaluated. It should be up to the executor code to make use of the knowledge how it sees fit. The planner should not make assumptions on how the executor will make use of this knowledge. I've carefully crafted a comment in nodeMergejoin.c which explains all of this, and also about the limitations and where things might be improved later. The code in question is: mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && mergestate->js.first_inner_tuple_only; > I don't especially like the centralized unique_rels cache structure. > It's not terribly clear what it's for, and you're making uncomfortably > large assumptions about never indexing off the end of the array, despite > not having any range checks for the subscripts. Wouldn't it be better to > add simple List fields into RelOptInfo, representing the outer rels this > rel has been proven unique or not-unique for? That would dodge the array > size question and would be more readily extensible to someday applying > this to join rels . hmm, perhaps bounds checking could be done, but it's no worse than planner_rt_fetch(). I don't really think the List idea would be nearly as efficient. The array provides a direct lookup for the List of proof relations. A List of List list would require a linear lookup just to find the correct List, then the existing linear lookup to find the proofs. The cache is designed to be fast. Slowing it down seems like a bad idea. Perhaps throwing it away would be better, since it's not required and was only added as an optimisation. The non_unique_rels will most often have a NULL bitmap set due to the incremental join search by the standard planner. So access to this as an array should be very fast, as we'll quickly realise there are no proofs to be found. > I also think some more thought is needed about handling JOIN_UNIQUE_OUTER > and JOIN_UNIQUE_INNER cases. In the first place, the patch cavalierly > violates the statement in joinpath.c that those jointype values never > propagate outside that module. In the second place, shouldn't > JOIN_UNIQUE_INNER automatically result in the path getting marked > inner_unique? I suspect the logic in add_paths_to_joinrel ought to > look something like > > if (jointype == JOIN_UNIQUE_INNER) > extra.inner_unique = true; > else > extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel, > (jointype == JOIN_UNIQUE_OUTER ? JOIN_INNER : > jointype), > restrictlist); hmm, innerrel_is_unique() is without prejudice as to the jointypes it supports, so much so that the argument is completely ignored. It was probably left over from some previous incarnation of the patch. If we treat JOIN_UNIQUE_INNER specially, then we'd better be sure that it's made unique on the RHS join quals. It looks like create_unique_path() uses sjinfo->semi_rhs_exprs to uniquify the relation, and compute_semijoin_info() seems to take all of the join conditions there or nothing at all, so I think it's safe to automatically mark JOIN_UNIQUE_INNERs this way. Updated patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
unique_joins_2017-04-07.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers