Re: Parallel Append can break run-time partition pruning

David Rowley Tue, 21 Apr 2020 20:23:14 -0700

On Tue, 21 Apr 2020 at 15:03, David Rowley <dgrowle...@gmail.com> wrote:
> I wonder if the fix should be more something along the lines of trying
> to merge things do we only generate a single partial path.  That way
> we wouldn't be at the mercy of the logic in add_partial_path() to
> accept or reject the path based on the order the paths are added.


I took a shot at doing things this way.

First, I'll recap on the problem this is trying to solve:

add_paths_to_append_rel() attempts to create two separate partial
Append paths. I'll describe both of these below:

Path1: This path is generated regardless of if Parallel Append is
enabled and contains all the cheapest partial paths from each child
relation.   If parallel append is enabled this will become a Parallel
Append. If it's not then a non-parallel append will be created
containing the list of partial subpaths. Here's an example from
select_parallel.out:

                          QUERY PLAN
--------------------------------------------------------------
 Finalize Aggregate
   ->  Gather
         Workers Planned: 1
         ->  Partial Aggregate
               ->  Append
                     ->  Parallel Seq Scan on a_star a_star_1
                     ->  Parallel Seq Scan on b_star a_star_2
                     ->  Parallel Seq Scan on c_star a_star_3
                     ->  Parallel Seq Scan on d_star a_star_4
                     ->  Parallel Seq Scan on e_star a_star_5
                     ->  Parallel Seq Scan on f_star a_star_6

Path2: We only ever consider this one when enable_parallel_append ==
true and the append rel's consider_parallel == true.  When this path
is generated, it'll always be for a Parallel Append.  This path may
contain a mix of partial paths for child rels and parallel_safe child
paths, of which will only be visited by a single worker.

The problem is that path1 does not pullup child Appends when the child
append path contains a mix of partial and parallel safe paths (i.e a
sub-path2, per above). Since we create path2 in addition to path1, the
costs come out the same even though path1 couldn't pullup the
sub-Append paths. Unfortunately, the costs are the same so path1 is
prefered since it's added first. add_partial_path() just rejects path2
based on it being too similar in cost to the existing path1.

In the attached, I'm trying to solve this by only created 1 partial
Append path in the first place. This path will always try to use the
cheapest partial path, or the cheapest parallel safe path, if parallel
append is allowed and that path is cheaper than the cheapest partial
path.

I believe the attached gives us what we want and additionally, since
it should now always pullup the sub-Appends, then there's no need to
consider adjusting partitioned_rels based on if the pull-up occurred
or not. Those should now be right in all cases. This should also fix
the run-time pruning issue too since in my original test case it'll
pullup the sub-Append which means that the code added in 8edd0e794 is
no longer going to do anything with it as the top-level Append will
never contain just 1 subpath.

I'm reasonably certain that this is correct, but I did find it a bit
mind-bending considering all the possible cases, so it could do with
some more eyes on it.   I've not really done a final polish of the
comments yet. I'll do that if the patch is looking promising.

The output of the original test with the attached is as follows:

postgres=# explain (analyze on, costs off, timing off, summary off)
postgres-# select * from list where a = (select 1) and b > 0;
                                QUERY PLAN
---------------------------------------------------------------------------
 Gather (actual rows=0 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Result (actual rows=1 loops=1)
   ->  Parallel Append (actual rows=0 loops=3)
         ->  Parallel Seq Scan on list_12_2 list_2 (never executed)
               Filter: ((b > 0) AND (a = $0))
         ->  Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
               Filter: ((b > 0) AND (a = $0))
(11 rows)


postgres=# -- force the 2nd subnode of the Append to be non-parallel.
postgres=# alter table list_12_1 set (parallel_workers=0);
ALTER TABLE
postgres=# explain (analyze on, costs off, timing off, summary off)
postgres-# select * from list where a = (select 1) and b > 0;
                             QUERY PLAN
--------------------------------------------------------------------
 Gather (actual rows=0 loops=1)
   Workers Planned: 2
   Params Evaluated: $0
   Workers Launched: 2
   InitPlan 1 (returns $0)
     ->  Result (actual rows=1 loops=1)
   ->  Parallel Append (actual rows=0 loops=3)
         ->  Seq Scan on list_12_1 list_1 (actual rows=0 loops=1)
               Filter: ((b > 0) AND (a = $0))
         ->  Parallel Seq Scan on list_12_2 list_2 (never executed)
               Filter: ((b > 0) AND (a = $0))
(11 rows)

Notice we get "(never executed)" in both cases.

David

always_pullup_child_appends_to_top_level_append.patch
Description: Binary data

Re: Parallel Append can break run-time partition pruning

Reply via email to