On 13 December 2017 at 00:33, Beena Emerson <memissemer...@gmail.com> wrote: > PFA the updated patch, this can be applied over the v13 patches [1] > over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3
Hi Beena, Thanks for posting an updated patch. I've been looking over this and I think that the use of the PartitionDispatch in set_append_subplan_indexes is not correct. What we need here is the index of the Append's subnode and that's not what RelationGetPartitionDispatchInfo() gives you. Remember that some partitions could have been pruned away already during planning. This quick example shows that the partition selection is not correct. create table p (a int, b int) partition by range (a); create table p_a_neg partition of p for values from (minvalue) to (0) partition by range (b); create table p_a_pos partition of p for values from (0) to (maxvalue) partition by range (b); create table p_a_neg_b_neg partition of p_a_neg for values from (minvalue) to (0); create table p_a_neg_b_pos partition of p_a_neg for values from (0) to (maxvalue); create table p_a_pos_b_neg partition of p_a_pos for values from (minvalue) to (0); create table p_a_pos_b_pos partition of p_a_pos for values from (0) to (maxvalue); prepare q1 (int, int) as select * from p where a = $1 and b = $1; explain analyze execute q1 (-1,-1); -- this works. QUERY PLAN -------------------------------------------------------------------------------------------------------------- Append (cost=0.00..175.60 rows=4 width=8) (actual time=1.099..1.099 rows=0 loops=1) Runtime Partition Pruning: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) (actual time=0.023..0.023 rows=0 loops=1) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) (12 rows) explain analyze execute q1 (-1,1); -- should scan p_a_neg_b_pos, but does not. QUERY PLAN -------------------------------------------------------------------------------------------------------------- Append (cost=0.00..175.60 rows=4 width=8) (actual time=758996.359..758996.359 rows=0 loops=1) Runtime Partition Pruning: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) (actual time=0.056..0.056 rows=0 loops=1) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) (never executed) Filter: ((a = $1) AND (b = $1)) (12 rows) So, I started to look at what the best way to put this right might be. I see that since Parallel Append was committed that the subnodes are now sorted in cost order inside create_append_path(), so likely we'll want to delay figuring out the subpath list indexes until after that's done since sorting would scramble our index arrays. We could simply look at the subpaths at the end of create_append_path() and create some sort of new matrix type that can accept the output of Amit's get_partitions_from_clauses() and translate that Bitmapset into the subpath indexes (another Bitmapset). This will also need to work for sub-partitions too, so this matrix must be some sort of tree that we can descend into when we see that get_partitions_from_clauses returned a bit for a sub-partition instead of a leaf-partition. I bashed this idea around a bit and I came up with the attached. It's very far from complete and in a very WIP state. I've not really done anything to make the correct clause list available in nodeAppend.c yet, but I think the code that's there is worthy of a look. I've not done that much work on the new choose_next_subplan* functions in nodeAppend.c. I just modified choose_next_subplan_locally to show how this set of functions need to take into account the subnode bitmap set of valid partitions to scan. Perhaps some special case is needed to have these functions ignore the Bitmapset when runtime pruning is disabled (perhaps a completely new set of the functions is needed to support the selection of the next non-pruned partition). Although, probably that can be debated a bit later as it's a fairly minor detail for now. My patch also lacks any means to extract the Params during match_clauses_to_partkey(), or at least most of the cases. I've just added 1 case there. I did this because I thought it was better to extract the ParamIds rather than a bool to say we've matched params. This way we can only reevaluate which subplans to look at on rescan of an Append if and only if the params we actually care about have changed. I've not given this part a huge amount of thought yet. I'm a little unsure where to go from here. Obviously, this is quite a major rewrite of your patch. The parts that I've got missing likely can use quite a bit of the stuff you've already written, but that needs some review. I wanted to post this now as I know you're busy working on this to rebase it on parallel Append and to also address Robert's concerns, which all seem valid to me. What's the best way for us to coordinate our efforts on this? Maybe you could look at my patch and sanity check it to ensure I'm not taking anything in the wrong direction? I also think that MergeAppend needs similar work, but I think it's best to get the Append case working first, or perhaps that's another patch... Please find attached my patch, which is based directly on top of Amit's faster partition pruning v14 [1], which I patched against 4034db215b9 [1] https://www.postgresql.org/message-id/9b98fc47-34b8-0ab6-27fc-c8a0889f2e5b%40lab.ntt.co.jp -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
runtime_prune_drowley_v1.patch
Description: Binary data