On 7 April 2017 at 11:47, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> On 7 April 2017 at 07:26, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> I'm looking through this, and I'm failing to see where it deals with >>> the problem we discussed last time, namely that you can't apply the >>> optimization unless all clauses that were used in the uniqueness >>> proof are included in the join's merge/hash conditions + joinquals. > >> The code in question is: >> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && >> mergestate->js.first_inner_tuple_only; > > Uh, AFAICS that only protects the skip-mark-and-restore logic. > What I'm on about is that you can't do the early advance to the > next outer tuple unless you're sure that all the quals that were > relevant to the uniqueness proof have been checked for the current > inner tuple. That affects all three join types not only merge.
Well, look at the join code and you'll see this only happens after the joinqual is evaulated. I didn't make a special effort here. I just borrowed the location that JOIN_SEMI was already using. For example, from hash join: if (joinqual == NULL || ExecQual(joinqual, econtext)) { node->hj_MatchedOuter = true; HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)); /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->hj_JoinState = HJ_NEED_NEW_OUTER; continue; } /* * Skip to the next outer tuple if we only need to join to * the first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->hj_JoinState = HJ_NEED_NEW_OUTER; Note the first line and the final two lines. Here's the one from nested loop: if (ExecQual(joinqual, econtext)) { node->nl_MatchedOuter = true; /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->nl_NeedNewOuter = true; continue; /* return to top of loop */ } /* * Skip to the next outer tuple if we only need to join to the * first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->nl_NeedNewOuter = true; Again, note the first line and final 2 lines. > The case that would be relevant to this is, eg, > > create table t1 (f1 int, f2 int, primary key(f1,f2)); > > select * from t_outer left join t1 on (t_outer.f1 = t1.f1) > where t_outer.f2 = t2.f2; hmm, that query is not valid, unless you have created some table named t_outer. I don't know how you've defined that. So I guess you must have meant: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2; QUERY PLAN --------------------------------------------------------------------------- Hash Join (cost=66.50..133.57 rows=128 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: Yes Hash Cond: ((t_outer.f1 = t1.f1) AND (t_outer.f2 = t1.f2)) -> Seq Scan on public.t1 t_outer (cost=0.00..32.60 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Hash (cost=32.60..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 -> Seq Scan on public.t1 (cost=0.00..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 (10 rows) Which did become an INNER JOIN due to the strict W If you'd had done: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2 or t1.f1 is null; QUERY PLAN ------------------------------------------------------------------------------------------------ Merge Left Join (cost=0.31..608.67 rows=255 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: No Merge Cond: (t_outer.f1 = t1.f1) Filter: ((t_outer.f2 = t1.f2) OR (t1.f1 IS NULL)) -> Index Only Scan using t1_pkey on public.t1 t_outer (cost=0.16..78.06 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Materialize (cost=0.16..83.71 rows=2260 width=8) Output: t1.f1, t1.f2 -> Index Only Scan using t1_pkey on public.t1 (cost=0.16..78.06 rows=2260 width=8) Output: t1.f1, t1.f2 (11 rows) You'll notice that "Inner Unique: No" > Your existing patch would think t1 is unique-inner, but the qual pushed > down from WHERE would not be a joinqual so the wrong thing would happen > at runtime. > > (Hm ... actually, this example wouldn't fail as written because > the WHERE qual is probably strict, so the left join would get > reduced to an inner join and then pushed-down-ness no longer > matters. But hopefully you get my drift.) Oh yeah. I get it, but that's why we ignore !can_join clauses /* Ignore if it's not a mergejoinable clause */ if (!restrictinfo->can_join || restrictinfo->mergeopfamilies == NIL) continue; /* not mergejoinable */ no? >> I don't really think the List idea would be nearly as efficient. > > No, what I'm saying is that each RelOptInfo would contain a single List of > Relids of proven-unique-for outer rels (and another one for the negative > cache). No array, no more searching than you have now, just removal of an > uncertainly-safe array fetch. That's way cleaner. Thanks. I've changed it that way. Feeling silly now for having done it the original way. Updated patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
unique_joins_2017-04-07a.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers