To summarize the current state of parallel grouping sets, we now have
two available implementations for it.

1) Each worker performs an aggregation step, producing a partial result
for each group of which that process is aware. Then the partial results
are gathered to the leader, which then performs a grouping sets
aggregation, as in patch [1].

This implementation is not very efficient sometimes, because the group
key for Partial Aggregate has to be all the columns involved in the
grouping sets.

2) Each worker performs a grouping sets aggregation on its partial
data, and tags 'GroupingSetId' for each tuple produced by partial
aggregate. Then the partial results are gathered to the leader, and the
leader performs a modified grouping aggregate, which dispatches the
partial results into different pipe according to 'GroupingSetId', as in
patch [2], or instead as another method, the leader performs a normal
aggregation, with 'GroupingSetId' included in the group keys, as
discussed in [3].

The second implementation would be generally better than the first one
in performance, and we have decided to concentrate on it.

[1]
https://www.postgresql.org/message-id/CAN_9JTx3NM12ZDzEYcOVLFiCBvwMHyM0gENvtTpKBoOOgcs=k...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/can_9jtwtttnxhbr5ahuqvcriz3hxvppx1jwe--dcsdjyuhr...@mail.gmail.com
[3]
https://www.postgresql.org/message-id/CAN_9JTwtzttEmdXvMbJqXt=51kxibtckepkq6kk2pz6xz6m...@mail.gmail.com

Thanks
Richard

>

Reply via email to