On Fri, 14 Jun 2019 at 11:45, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > On Wed, Jun 12, 2019 at 10:58:44AM +0800, Richard Guo wrote:
> ># explain (costs off, verbose) select c1, c2, avg(c3) from t2 group by > >grouping sets((c1,c2), (c1)); > > QUERY PLAN > >-------------------------------------------------------------- > > Finalize GroupAggregate > > Output: c1, c2, avg(c3), (gset_id) > > Group Key: t2.c1, t2.c2, (gset_id) > > -> Gather Merge > > Output: c1, c2, (gset_id), (PARTIAL avg(c3)) > > Workers Planned: 2 > > -> Sort > > Output: c1, c2, (gset_id), (PARTIAL avg(c3)) > > Sort Key: t2.c1, t2.c2, (gset_id) > > -> Partial HashAggregate > > Output: c1, c2, gset_id, PARTIAL avg(c3) > > Hash Key: t2.c1, t2.c2 > > Hash Key: t2.c1 > > -> Parallel Seq Scan on public.t2 > > Output: c1, c2, c3 > >(15 rows) > > > > OK, I'm not sure I understand the point of this - can you give an > example which is supposed to benefit from this? Where does the speedup > came from? I think this is a bad example since the first grouping set is a superset of the 2nd. If those were independent and each grouping set produced a reasonable number of groups then it may be better to do it this way instead of grouping by all exprs in all grouping sets in the first phase, as is done by #1. To do #2 would require that we tag the aggregate state with the grouping set that belong to, which seem to be what gset_id is in Richard's output. In my example upthread the first phase of aggregation produced a group per input row. Method #2 would work better for that case since it would only produce 2000 groups instead of 1 million. Likely both methods would be good to consider, but since #1 seems much easier than #2, then to me it seems to make sense to start there. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services