Re: parallel distinct union and aggregate support patch

Heikki Linnakangas Fri, 27 Nov 2020 07:55:50 -0800

I also had a quick look at the patch and the comments made so far. Summary:


1. The performance results are promising.

2. The code needs comments.

Regarding the design:

Thomas Munro mentioned the idea of a "Parallel Repartition" node thatwould redistribute tuples like this. As I understand it, the differenceis that this BatchSort implementation collects all tuples in a tuplesortor a tuplestore, while a Parallel Repartition node would justredistribute the tuples to the workers, without buffering. The receivingworker could put the tuples to a tuplestore or sort if needed.

I think a non-buffering Reparttion node would be simpler, and thusbetter. In these patches, you have a BatchSort node, and batchstore, buta simple Parallel Repartition node could do both. For example, toimplement distinct:


Gather
-  > Unique
       -> Sort
           -> Parallel Redistribute
               -> Parallel Seq Scan

And a Hash Agg would look like this:

Gather
-  > Hash Agg
        -> Parallel Redistribute
            -> Parallel Seq Scan


I'm marking this as Waiting on Author in the commitfest.

- Heikki

Re: parallel distinct union and aggregate support patch

Reply via email to