Re: [HACKERS] Parallel Aggregate

David Rowley Fri, 18 Mar 2016 22:57:28 -0700

On 18 March 2016 at 01:22, Amit Kapila <[email protected]> wrote:
> On Thu, Mar 17, 2016 at 10:35 AM, David Rowley
> <[email protected]> wrote:
>>
>> On 17 March 2016 at 01:19, Amit Kapila <[email protected]> wrote:
>> > Few assorted comments:
>> >
>> > 2.
>> >  AggPath *
>> >  create_agg_path(PlannerInfo *root,
>> > @@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
>> >
>> > List *groupClause,
>> >   List *qual,
>> >   const AggClauseCosts
>> > *aggcosts,
>> > - double numGroups)
>> > + double numGroups,
>> > +
>> > bool combineStates,
>> > + bool finalizeAggs)
>> >
>> > Don't you need to set parallel_aware flag in this function as we do for
>> > create_seqscan_path()?
>>
>> I don't really know the answer to that... I mean there's nothing
>> special done in nodeAgg.c if the node is running in a worker or in the
>> main process.
>>
>
> On again thinking about it, I think it is okay to set parallel_aware flag as
> false.  This flag means whether that particular node has any parallelism
> behaviour which is true for seqscan, but I think not for partial aggregate
> node.
>
> Few other comments on latest patch:
>
> 1.
>
> + /*
> + * XXX does this need estimated for each partial path, or are they
> + * all
> going to be the same anyway?
> + */
> + dNumPartialGroups = get_number_of_groups(root,
> +
> clamp_row_est(partial_aggregate_path->rows),
> +
> rollup_lists,
> +
> rollup_groupclauses);
>
> For considering partial groups, do we need to rollup related lists?


No it doesn't you're right. I did mean to remove these, but they're
NIL anyway. Seems better to remove them to prevent confusion.

> 2.
> + hashaggtablesize = estimate_hashagg_tablesize(partial_aggregate_path,
> +
>  &agg_costs,
> +
>  dNumPartialGroups);
> +
> + /*
> + * Generate a
> hashagg Path, if we can, but we'll skip this if the hash
> + * table looks like it'll exceed work_mem.
> +
> */
> + if (can_hash && hashaggtablesize < work_mem * 1024L)
>
>
> hash table size should be estimated only if can_hash is true.

Good point. Changed.

>
> 3.
> + foreach(lc, grouped_rel->partial_pathlist)
> + {
> + Path   *path =
> (Path *) lfirst(lc);
> + double total_groups;
> +
> + total_groups = path-
>>parallel_degree * path->rows;
> +
> + path = (Path *) create_gather_path(root, grouped_rel, path,
> NULL,
> +   &total_groups);
>
> Do you need to perform it foreach partial path or just do it for
> firstpartial path?

That's true. The order here does not matter since we're passing
directly into a Gather node, so it's wasteful to consider anything
apart from the cheapest path. -- Fixed.

There was also a missing hash table size check on the Finalize
HashAggregate Path consideration. I've added that now.

Updated patch is attached. Thanks for the re-review.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

0001-Allow-aggregation-to-happen-in-parallel_2016-03-18a.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

Reply via email to