On Sun, Nov 16, 2014 at 10:09 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> On 15 October 2014 11:03, David Rowley <dgrowle...@gmail.com> wrote: > > > The explain analyze from the above query looks like: > > test=# explain (analyze, costs off, timing off) select count(*) from t1 > > inner join t2 on t1.t2_id=t2.id; > > QUERY PLAN > > ------------------------------------------------------------------ > > Aggregate (actual rows=1 loops=1) > > -> Nested Loop (actual rows=1000000 loops=1) > > -> Seq Scan on t1 (actual rows=1000000 loops=1) > > -> Index Only Scan using t2_pkey on t2 (never executed) > > Index Cond: (id = t1.t2_id) > > Heap Fetches: 0 > > Execution time: 124.990 ms > > (7 rows) > > > > As you can see the scan on t2 never occurred. > > Very good, happy to see this happening (yay FKs!) and with > PostgreSQL-style rigour. > > :) > I've reviewed the patch from cold and I have a few comments. > > Thanks! > The plan you end up with here works quite differently from left outer > join removal, where the join is simply absent. That inconsistency > causes most of the other problems I see. > > I propose that we keep track of whether there are any potentially > skippable joins at the top of the plan. When we begin execution we do > a single if test to see if there is run-time work to do. If we pass > the run-time tests we then descend the tree and prune the plan to > completely remove unnecessary nodes. We end with an EXPLAIN and > EXPLAIN ANALYZE that looks like this > > > QUERY PLAN > > ------------------------------------------------------------------ > > Aggregate (actual rows=1 loops=1) > > -> Seq Scan on t1 (actual rows=1000000 loops=1) > > Doing that removes all the overheads and complexity; it also matches > how join removal currently works. > > This sounds much cleaner than what I have at the moment, although, you say EXPLAIN would look like that... I don't think that's quite true as the EXPLAIN still would have the un-pruned version, as the pruning would be done as executor start-up. Would it cause problems to have the EXPLAIN have a different looking plan than EXPLAIN ANALYZE? I'll need to look into how the plan is stored in the case of PREPARE statements, as no doubt I can't go vandalising any plans that are stored in the PREPARE hashtable. I'd need to make a copy first, unless that's already done for me. But I guess I'd only have to do that if some flag on PlannerInfo hasSkippableNodes was true, so likely the overhead of such a copy would be regained by skipping some joins. > The alternative is accepting some pretty horrible additional code in > most join types, plus a small regression on nested loop joins which I > would have to call out as regrettably unacceptable. (Horrible in this > sense that we don't want that code, not that David's code is poor). > > Yeah it is quite horrid. I did try and keep it as simple and as non-invasive as possible, but for nest loop it seemed there was just no better way. > The tests on the patch are pretty poor. If we should use EXPLAINs to > show a join removal that works and a join removal that fails. With a > few of the main permutations. > > Agreed. To be honest I abandoned the tests due to a problem with EXPLAIN ANALYZE outputting the variable timing information at the bottom. There's no way to disable this! So that makes testing much harder. I added myself to the list of complainers over here -> http://www.postgresql.org/message-id/caaphdvovzbtzljbd9vfaznwo6jook1k6-7rfq8zym9h7ndc...@mail.gmail.com but the proposed solution (diff tool which supports regex matching) is not all that simple, and I've not gotten around to attempting to make one yet. I've also attached a rebased patch, as the old one is no longer applying. Regards David Rowley
inner_join_removals_2014-11-16_3a40b4f.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers