On 18 March 2016 at 01:22, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Mar 17, 2016 at 10:35 AM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> >> On 17 March 2016 at 01:19, Amit Kapila <amit.kapil...@gmail.com> wrote: >> > Few assorted comments: >> > >> > 2. >> > AggPath * >> > create_agg_path(PlannerInfo *root, >> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root, >> > >> > List *groupClause, >> > List *qual, >> > const AggClauseCosts >> > *aggcosts, >> > - double numGroups) >> > + double numGroups, >> > + >> > bool combineStates, >> > + bool finalizeAggs) >> > >> > Don't you need to set parallel_aware flag in this function as we do for >> > create_seqscan_path()? >> >> I don't really know the answer to that... I mean there's nothing >> special done in nodeAgg.c if the node is running in a worker or in the >> main process. >> > > On again thinking about it, I think it is okay to set parallel_aware flag as > false. This flag means whether that particular node has any parallelism > behaviour which is true for seqscan, but I think not for partial aggregate > node. > > Few other comments on latest patch: > > 1. > > + /* > + * XXX does this need estimated for each partial path, or are they > + * all > going to be the same anyway? > + */ > + dNumPartialGroups = get_number_of_groups(root, > + > clamp_row_est(partial_aggregate_path->rows), > + > rollup_lists, > + > rollup_groupclauses); > > For considering partial groups, do we need to rollup related lists?
No it doesn't you're right. I did mean to remove these, but they're NIL anyway. Seems better to remove them to prevent confusion. > 2. > + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path, > + > &agg_costs, > + > dNumPartialGroups); > + > + /* > + * Generate a > hashagg Path, if we can, but we'll skip this if the hash > + * table looks like it'll exceed work_mem. > + > */ > + if (can_hash && hashaggtablesize < work_mem * 1024L) > > > hash table size should be estimated only if can_hash is true. Good point. Changed. > > 3. > + foreach(lc, grouped_rel->partial_pathlist) > + { > + Path *path = > (Path *) lfirst(lc); > + double total_groups; > + > + total_groups = path- >>parallel_degree * path->rows; > + > + path = (Path *) create_gather_path(root, grouped_rel, path, > NULL, > + &total_groups); > > Do you need to perform it foreach partial path or just do it for > firstpartial path? That's true. The order here does not matter since we're passing directly into a Gather node, so it's wasteful to consider anything apart from the cheapest path. -- Fixed. There was also a missing hash table size check on the Finalize HashAggregate Path consideration. I've added that now. Updated patch is attached. Thanks for the re-review. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers