On Tue, 21 Apr 2020 at 15:03, David Rowley <dgrowle...@gmail.com> wrote: > I wonder if the fix should be more something along the lines of trying > to merge things do we only generate a single partial path. That way > we wouldn't be at the mercy of the logic in add_partial_path() to > accept or reject the path based on the order the paths are added.
I took a shot at doing things this way. First, I'll recap on the problem this is trying to solve: add_paths_to_append_rel() attempts to create two separate partial Append paths. I'll describe both of these below: Path1: This path is generated regardless of if Parallel Append is enabled and contains all the cheapest partial paths from each child relation. If parallel append is enabled this will become a Parallel Append. If it's not then a non-parallel append will be created containing the list of partial subpaths. Here's an example from select_parallel.out: QUERY PLAN -------------------------------------------------------------- Finalize Aggregate -> Gather Workers Planned: 1 -> Partial Aggregate -> Append -> Parallel Seq Scan on a_star a_star_1 -> Parallel Seq Scan on b_star a_star_2 -> Parallel Seq Scan on c_star a_star_3 -> Parallel Seq Scan on d_star a_star_4 -> Parallel Seq Scan on e_star a_star_5 -> Parallel Seq Scan on f_star a_star_6 Path2: We only ever consider this one when enable_parallel_append == true and the append rel's consider_parallel == true. When this path is generated, it'll always be for a Parallel Append. This path may contain a mix of partial paths for child rels and parallel_safe child paths, of which will only be visited by a single worker. The problem is that path1 does not pullup child Appends when the child append path contains a mix of partial and parallel safe paths (i.e a sub-path2, per above). Since we create path2 in addition to path1, the costs come out the same even though path1 couldn't pullup the sub-Append paths. Unfortunately, the costs are the same so path1 is prefered since it's added first. add_partial_path() just rejects path2 based on it being too similar in cost to the existing path1. In the attached, I'm trying to solve this by only created 1 partial Append path in the first place. This path will always try to use the cheapest partial path, or the cheapest parallel safe path, if parallel append is allowed and that path is cheaper than the cheapest partial path. I believe the attached gives us what we want and additionally, since it should now always pullup the sub-Appends, then there's no need to consider adjusting partitioned_rels based on if the pull-up occurred or not. Those should now be right in all cases. This should also fix the run-time pruning issue too since in my original test case it'll pullup the sub-Append which means that the code added in 8edd0e794 is no longer going to do anything with it as the top-level Append will never contain just 1 subpath. I'm reasonably certain that this is correct, but I did find it a bit mind-bending considering all the possible cases, so it could do with some more eyes on it. I've not really done a final polish of the comments yet. I'll do that if the patch is looking promising. The output of the original test with the attached is as follows: postgres=# explain (analyze on, costs off, timing off, summary off) postgres-# select * from list where a = (select 1) and b > 0; QUERY PLAN --------------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Parallel Seq Scan on list_12_2 list_2 (never executed) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) (11 rows) postgres=# -- force the 2nd subnode of the Append to be non-parallel. postgres=# alter table list_12_1 set (parallel_workers=0); ALTER TABLE postgres=# explain (analyze on, costs off, timing off, summary off) postgres-# select * from list where a = (select 1) and b > 0; QUERY PLAN -------------------------------------------------------------------- Gather (actual rows=0 loops=1) Workers Planned: 2 Params Evaluated: $0 Workers Launched: 2 InitPlan 1 (returns $0) -> Result (actual rows=1 loops=1) -> Parallel Append (actual rows=0 loops=3) -> Seq Scan on list_12_1 list_1 (actual rows=0 loops=1) Filter: ((b > 0) AND (a = $0)) -> Parallel Seq Scan on list_12_2 list_2 (never executed) Filter: ((b > 0) AND (a = $0)) (11 rows) Notice we get "(never executed)" in both cases. David
always_pullup_child_appends_to_top_level_append.patch
Description: Binary data