Hi David, Thank you for reviewing and looking at this.
I have attached the WIP patch which incorporates some of Robert's comments and is rebased over Amit's v14 patch. On Sat, Dec 16, 2017 at 11:35 AM, David Rowley <david.row...@2ndquadrant.com> wrote: > On 13 December 2017 at 00:33, Beena Emerson <memissemer...@gmail.com> wrote: >> PFA the updated patch, this can be applied over the v13 patches [1] >> over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3 > > Hi Beena, > > Thanks for posting an updated patch. > > I've been looking over this and I think that the use of the > PartitionDispatch in set_append_subplan_indexes is not correct. What > we need here is the index of the Append's subnode and that's not what > RelationGetPartitionDispatchInfo() gives you. Remember that some > partitions could have been pruned away already during planning. > > This quick example shows that the partition selection is not correct. > > create table p (a int, b int) partition by range (a); > > create table p_a_neg partition of p for values from (minvalue) to (0) > partition by range (b); > create table p_a_pos partition of p for values from (0) to (maxvalue) > partition by range (b); > > create table p_a_neg_b_neg partition of p_a_neg for values from > (minvalue) to (0); > create table p_a_neg_b_pos partition of p_a_neg for values from (0) to > (maxvalue); > > create table p_a_pos_b_neg partition of p_a_pos for values from > (minvalue) to (0); > create table p_a_pos_b_pos partition of p_a_pos for values from (0) to > (maxvalue); > > prepare q1 (int, int) as select * from p where a = $1 and b = $1; > > explain analyze execute q1 (-1,-1); -- this works. > QUERY PLAN > -------------------------------------------------------------------------------------------------------------- > Append (cost=0.00..175.60 rows=4 width=8) (actual time=1.099..1.099 > rows=0 loops=1) > Runtime Partition Pruning: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) > (actual time=0.023..0.023 rows=0 loops=1) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > (12 rows) > > explain analyze execute q1 (-1,1); -- should scan p_a_neg_b_pos, but does not. > QUERY PLAN > -------------------------------------------------------------------------------------------------------------- > Append (cost=0.00..175.60 rows=4 width=8) (actual > time=758996.359..758996.359 rows=0 loops=1) > Runtime Partition Pruning: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_neg_b_neg (cost=0.00..43.90 rows=1 width=8) > (actual time=0.056..0.056 rows=0 loops=1) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_neg_b_pos (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_pos_b_neg (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > -> Seq Scan on p_a_pos_b_pos (cost=0.00..43.90 rows=1 width=8) > (never executed) > Filter: ((a = $1) AND (b = $1)) > (12 rows) > > So, I started to look at what the best way to put this right might be. > I see that since Parallel Append was committed that the subnodes are > now sorted in cost order inside create_append_path(), so likely we'll > want to delay figuring out the subpath list indexes until after that's > done since sorting would scramble our index arrays. We could simply > look at the subpaths at the end of create_append_path() and create > some sort of new matrix type that can accept the output of Amit's > get_partitions_from_clauses() and translate that Bitmapset into the > subpath indexes (another Bitmapset). This will also need to work for > sub-partitions too, so this matrix must be some sort of tree that we > can descend into when we see that get_partitions_from_clauses returned > a bit for a sub-partition instead of a leaf-partition. Yes, the change in sort order means that the current append_paths_array cannot be used for Parallel append and a new logic has to be devised. I have still not thought about it but your method seems like a good way to go. Currently I have worked on the Parallel bit considering that the appends_path_array holds the correct subplan_index. > > I bashed this idea around a bit and I came up with the attached. It's > very far from complete and in a very WIP state. I've not really done > anything to make the correct clause list available in nodeAppend.c > yet, but I think the code that's there is worthy of a look. I've not > done that much work on the new choose_next_subplan* functions in > nodeAppend.c. I just modified choose_next_subplan_locally to show how > this set of functions need to take into account the subnode bitmap set > of valid partitions to scan. Perhaps some special case is needed to > have these functions ignore the Bitmapset when runtime pruning is > disabled (perhaps a completely new set of the functions is needed to > support the selection of the next non-pruned partition). Although, > probably that can be debated a bit later as it's a fairly minor detail > for now. > > My patch also lacks any means to extract the Params during > match_clauses_to_partkey(), or at least most of the cases. I've just > added 1 case there. I did this because I thought it was better to > extract the ParamIds rather than a bool to say we've matched params. > This way we can only reevaluate which subplans to look at on rescan of > an Append if and only if the params we actually care about have > changed. I've not given this part a huge amount of thought yet. Currently, during ReScan the patch prunes whenever the Param changes and I had this 'rescan pruning optimization' in mind but had not worked on it. > > I'm a little unsure where to go from here. Obviously, this is quite a > major rewrite of your patch. The parts that I've got missing likely > can use quite a bit of the stuff you've already written, but that > needs some review. I wanted to post this now as I know you're busy > working on this to rebase it on parallel Append and to also address > Robert's concerns, which all seem valid to me. > > What's the best way for us to coordinate our efforts on this? Maybe > you could look at my patch and sanity check it to ensure I'm not > taking anything in the wrong direction? I have not seen the patch in depth but this approach seems good. Dilip has already worked on the join equality bug you pointed out before, I am yet to merge that patch and will work on the optimizer comments of Robert's. Maybe I can incorporate your patch while working on it. -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company