On Tue, Mar 6, 2018 at 2:29 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke > <jeevan.cha...@enterprisedb.com> wrote: > > However, to perform Gather or Gather Merge once we have all partial paths > > ready, and to avoid too many existing code rearrangement, I am calling > > try_partitionwise_grouping() before we do any aggregation/grouping on > whole > > relation. By doing this, we will be having all partial paths in > > partially_grouped_rel and then existing code will do required > finalization > > along with any Gather or Gather Merge, if required. > > > > Please have a look over attached patch-set and let me know if it needs > > further changes. > > This does look better. > Thank you, Robert. > > + <term><varname>enable_partitionwise_agg</varname> > (<type>boolean</type>) > > Please don't abbreviate "aggregate" to "agg". > This is in-lined with enable_hashagg GUC. Do you think enable_partitionwise_aggregate seems better? But it will be not consistent with other GUC names like enable_hashagg then. > > - /* Build final grouping paths */ > - add_paths_to_grouping_rel(root, input_rel, grouped_rel, target, > - > partially_grouped_rel, agg_costs, > - > &agg_final_costs, gd, can_sort, can_hash, > - dNumGroups, > (List *) parse->havingQual); > + if (isPartialAgg) > + { > + Assert(agg_partial_costs && agg_final_costs); > + add_paths_to_partial_grouping_rel(root, input_rel, > + > partially_grouped_rel, > + > agg_partial_costs, > + > gd, can_sort, can_hash, > + > false, true); > + } > + else > + { > + double dNumGroups; > + > + /* Estimate number of groups. */ > + dNumGroups = get_number_of_groups(root, > + > cheapest_path->rows, > + > gd, > + > child_data ? make_tlist_from_pathtarget(target) : > parse->targetList); > + > + /* Build final grouping paths */ > + add_paths_to_grouping_rel(root, input_rel, grouped_rel, > target, > + > partially_grouped_rel, agg_costs, > + > agg_final_costs, gd, can_sort, can_hash, > + > dNumGroups, (List *) havingQual); > + } > > This looks strange. Why do we go down two completely different code > paths here? It is because when isPartialAgg = true we need to create partially aggregated non-partial paths which should be added in partially_grouped_rel->pathlist. And when isPartialAgg = false, we are creating fully aggregated paths which goes into grouped_rel->pathlist. > It seems to me that the set of paths we add to the > partial_pathlist shouldn't depend at all on isPartialAgg. I might be > confused, but it seems to me that any aggregate path we construct that > is going to run in parallel must necessarily be partial, because even > if each group will occur only in one partition, it might still occur > in multiple workers for that partition, so finalization would be > needed. Thats's true. We are creating partially aggregated partial paths for this and keeps them in partially_grouped_rel->partial_pathlist. > On the other hand, for non-partial paths, we can add then to > partially_grouped_rel when isPartialAgg = true and to grouped_rel when > isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE > vs. AGGSPLIT_INITIAL_SERIAL. Yes. As explained above, they goes in pathlist of respective Rel. However, PathTarget is different too, we need partial_pathtarget when isPartialAgg = true and also need agg_partial_costs. > But that doesn't appear to be what this > is doing. > So the code for doing partially aggregated partial paths and partially aggregated non-partial path is same except partial paths goes into partial_pathlist where as non-partial goes into pathlist of partially_grouped_rel. Thus, calling add_paths_to_partial_grouping_rel() when isPartialAgg = true seems correct. Also as we have decided, this function is responsible to create all partially aggregated paths including both partial and non-partial. Am I missing something? > > + /* > + * If there are any fully aggregated partial paths present, > may be because > + * of parallel Append over partitionwise aggregates, we must stick > a > + * Gather or Gather Merge path atop the cheapest partial path. > + */ > + if (grouped_rel->partial_pathlist) > > This comment is copied from someplace where the code does what the > comment says, but here it doesn't do any such thing. > Well, these comments are not present anywhere else than this place. With Parallel Append and Partitionwise aggregates, it is now possible to have fully aggregated partial paths now. And thus we need to stick a Gather and/or Gather Merge atop cheapest partial path. And I believe the code does the same. Am I missing something? > > More tomorrow... > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company