On Fri, Nov 27, 2020 at 10:55 AM Heikki Linnakangas <hlinn...@iki.fi> wrote: > I think a non-buffering Reparttion node would be simpler, and thus > better. In these patches, you have a BatchSort node, and batchstore, but > a simple Parallel Repartition node could do both. For example, to > implement distinct: > > Gather > - > Unique > -> Sort > -> Parallel Redistribute > -> Parallel Seq Scan > > And a Hash Agg would look like this: > > Gather > - > Hash Agg > -> Parallel Redistribute > -> Parallel Seq Scan > > I'm marking this as Waiting on Author in the commitfest.
I'm also intrigued by the parallel redistribute operator -- it seems like it might be more flexible than this approach. However, I'm concerned that there may be deadlock risks. If there is no buffer, or a fixed-size buffer, the buffer might be full, and process trying to jam tuples into the parallel redistribute would have to wait. Now if A can wait for B and at the same time B can wait for A, deadlock will ensue. In a naive implementation, this could happen with a single parallel redistribute operator: worker 1 is trying to send a tuple to worker 2, which can't receive it because it's busy sending a tuple to worker 1. That could probably be fixed by arranging for workers to try to try to receive data whenever they block in the middle of sending data. However, in general there can be multiple nodes that cause waiting in the tree: any number of Parallel Redistribute nodes, plus a Gather, plus maybe other stuff. The cheap way out of that problem is to use a buffer that can grow arbitrarily large, but that's not terribly satisfying either. -- Robert Haas EDB: http://www.enterprisedb.com