On 27 January 2017 at 12:39, Tom Lane <t...@sss.pgh.pa.us> wrote: > 2. In these same cases (unique/semi/anti joins), it is possible to avoid > mark/restore overhead in a mergejoin, because we can tweak the executor > logic to not require backing up the inner side. This goes further than > just tweaking the executor logic, though, because if we know we don't > need mark/restore then that actually makes some plan shapes legal that > weren't before: we don't need to interpose a Material node to protect > join inputs that can't mark/restore.
I've made modifications in the attached to add this optimization, and it's quite a significant improvement. Setup: create table t1 (id text primary key); insert into t1 select x from generate_series(1,1000000) x(x); create table t2 (id text primary key); insert into t2 select x from generate_series(1,1000000) x(x); vacuum freeze; Query: select count(*) from t1 inner join t2 on t1.id=t2.id; Unpatched Time: 369.618 ms Time: 358.208 ms Time: 357.094 ms Time: 355.642 ms Time: 358.193 ms Time: 371.272 ms Time: 354.386 ms Time: 364.277 ms Time: 346.091 ms Time: 358.757 ms Patched Time: 273.154 ms Time: 258.520 ms Time: 269.456 ms Time: 252.861 ms Time: 271.015 ms Time: 252.567 ms Time: 271.132 ms Time: 267.505 ms Time: 265.295 ms Time: 257.068 ms About 26.5% improvement for this case. > * The patch applies point #1 to only INNER and LEFT join modes, but > I don't really see why the idea wouldn't work for RIGHT and FULL modes, > ie the optimization seems potentially interesting for all executable join > types. Once you've got a match, you can immediately go to the next outer > tuple instead of continuing to scan inner. (Am I missing something?) No I believe I started development with LEFT JOINs, moved to INNER, then didn't progress to think of RIGHT or FULL. I've lifted this restriction in the patch. It seems no other work is required to have that just work. > * Particularly in view of the preceding point, I'm not that happy with > the way that management/caching of the "is it unique" knowledge is > done completely differently for INNER and LEFT joins. I wonder if > there's actually a good argument for that or is it mostly a development > sequence artifact. IOW, would it hurt to drop the SpecialJoinInfo > tie-in and just rely on the generic cache? I agree that special handling of one join type is not so pretty. However, LEFT JOINs still remain a bit special as they're the only ones we currently perform join removal on, and the patch modifies that code to make use of the new flag for those. This can improve planner performance of join removal when a join is removed successfully, as the previous code had to recheck uniqueness of each remaining LEFT JOIN again, whereas the new code only checks uniqueness of ones not previously marked as unique. This too likely could be done with the cache, although I'm a bit concerned with populating the cache, then performing a bunch of LEFT JOIN removals and leaving relids in the cache which no longer exist. Perhaps it's OK. I've just not found proofs in my head yet that it is. > * Because of where we apply the short-circuit logic in the executor, > it's only safe to consider the inner rel as unique if it is provably > unique using only the join clauses that drive the primary join mechanism > (ie, the "joinquals" not the "otherquals"). We already do ignore quals > that are pushed-down to an outer join, so that's good, but there's an > oversight: we will use a qual that is mergejoinable even if it's not > hashjoinable. That means we could get the wrong answers in a hash join. > I think probably the appropriate fix for the moment is just to consider > only clauses that are both mergeable and hashable while trying to prove > uniqueness. We do have some equality operators that support only one > or the other, but they're corner cases, and I'm dubious that it's worth > having to make separate proofs for merge and hash joins in order to > cater to those cases. hmm. I'm having trouble understanding why this is a problem for Unique joins, but not for join removal? However you mentioning this cause me to notice that this is only true in the patch for left joins, and the other join types are not consistent with that. create table a1 (a int, b int, primary key(a,b)); create table a2 (a int, b int, primary key(a,b)); explain (verbose, costs off) select * from a1 left join a2 on a1.a = a2.a and a2.b=1 ; QUERY PLAN -------------------------------------------------- Merge Left Join Output: a1.a, a1.b, a2.a, a2.b Inner Unique: Yes Merge Cond: (a1.a = a2.a) -> Index Only Scan using a1_pkey on public.a1 Output: a1.a, a1.b -> Sort Output: a2.a, a2.b Sort Key: a2.a -> Bitmap Heap Scan on public.a2 Output: a2.a, a2.b Recheck Cond: (a2.b = 1) -> Bitmap Index Scan on a2_pkey Index Cond: (a2.b = 1) explain (verbose, costs off) select * from a1 inner join a2 on a1.a = a2.a and a2.b=1 ; QUERY PLAN -------------------------------------------------- Nested Loop Output: a1.a, a1.b, a2.a, a2.b Inner Unique: No -> Bitmap Heap Scan on public.a2 Output: a2.a, a2.b Recheck Cond: (a2.b = 1) -> Bitmap Index Scan on a2_pkey Index Cond: (a2.b = 1) -> Index Only Scan using a1_pkey on public.a1 Output: a1.a, a1.b Index Cond: (a1.a = a2.a) Notice the inner join is not detected as unique, but the left join is. This is still not right in the attached. I'm not quite sure what to do about it yet. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
unique_joins_2017-01-28.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers