On 5 March 2016 at 07:25, Robert Haas <robertmh...@gmail.com> wrote: > On Thu, Mar 3, 2016 at 11:00 PM, David Rowley >> 3. The code never attempts to mix and match Grouping Agg and Hash Agg >> plans. e.g it could be an idea to perform Partial Hash Aggregate -> >> Gather -> Sort -> Finalize Group Aggregate, or hash as in the Finalize >> stage. I just thought doing this is more complex than what's really >> needed, but if someone can think of a case where this would be a great >> win then I'll listen, but you have to remember we don't have any >> pre-sorted partial paths at this stage, so an explicit sort is >> required *always*. This might change if someone invented partial btree >> index scans... but until then... > > Actually, Rahila Syed is working on that. But it's not done yet, so > presumably will not go into 9.6. > > I don't really see the logic of this, though. Currently, Gather > destroys the input ordering, so it seems preferable for the > finalize-aggregates stage to use a hash aggregate whenever possible, > whatever the partial-aggregate stage did. Otherwise, we need an > explicit sort. Anyway, it seems like the two stages should be costed > and decided on their own merits - there's no reason to chain the two > decisions together.
Thanks for looking at this. I've attached an updated patch which re-bases the whole patch on top of the upper planner changes which have just been committed. In this version create_grouping_paths() does now consider mixed strategies of hashed and sorted, although I have a few concerns with the code that I've written. I'm solely posting this early to minimise any duplicate work. My concerns are: 1. Since there's no cheapest_partial_path in RelOptInfo the code is currently considering every partial_path for parallel hash aggregate. With normal aggregation we only ever use the cheapest path, so this may not be future proof. As of today we do only have at most one partial path in the list, but there's no reason to code this with that assumption. I didn't put in much effort to improve this as I see code in generate_gather_paths() which also makes assumptions about there just being 1 partial path. Perhaps we should expand RelOptInfo to track the cheapest partial path? or maybe allpaths.c should have a function to fetch the cheapest out of the list? 2. In mixed parallel aggregate mode, when the query has no aggregate functions, the code currently will use a nodeAgg for AGG_SORTED strategy rather than a nodeGroup, as it would in serial agg mode. This probably needs to be changed. 3. Nothing in create_grouping_paths() looks at the force_parallel_mode GUC. I had a quick look at this GUC and was a bit surprised to see 3 possible states, but no explanation of what they do, so I've not added code which pays attention to this setting yet. I'd imagine this is just a matter of skipping serial path generation when parallel is possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea what FORCE_PARALLEL_REGRESS is for yet. The setrefs.c parts of the patch remain completely broken. I've not had time to look at this again yet, sorry. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
parallel_aggregation_cc75f61_2016-03-08.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers