> On Tue, Oct 27, 2020 at 3:27 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > On Fri, Oct 23, 2020 at 11:58 AM bu...@sohu.com <bu...@sohu.com> wrote: > > > > > > > Interesting idea. So IIUC, whenever a worker is scanning the tuple it > > > > will directly put it into the respective batch(shared tuple store), > > > > based on the hash on grouping column and once all the workers are > > > > doing preparing the batch then each worker will pick those baches one > > > > by one, perform sort and finish the aggregation. I think there is a > > > > scope of improvement that instead of directly putting the tuple to the > > > > batch what if the worker does the partial aggregations and then it > > > > places the partially aggregated rows in the shared tuple store based > > > > on the hash value and then the worker can pick the batch by batch. By > > > > doing this way, we can avoid doing large sorts. And then this > > > > approach can also be used with the hash aggregate, I mean the > > > > partially aggregated data by the hash aggregate can be put into the > > > > respective batch. > > > > > > Good idea. Batch sort suitable for large aggregate result rows, > > > in large aggregate result using partial aggregation maybe out of memory, > > > and all aggregate functions must support partial(using batch sort this is > > > unnecessary). > > > > > > Actually i written a batch hash store for hash aggregate(for pg11) like > > > this idea, > > > but not write partial aggregations to shared tuple store, it's write > > > origin tuple and hash value > > > to shared tuple store, But it's not support parallel grouping sets. > > > I'am trying to write parallel hash aggregate support using batch shared > > > tuple store for PG14, > > > and need support parallel grouping sets hash aggregate. > > > > I was trying to look into this patch to understand the logic in more > > detail. Actually, there are no comments at all so it's really hard to > > understand what the code is trying to do. > > > > I was reading the below functions, which is the main entry point for > > the batch sort. > > > > +static TupleTableSlot *ExecBatchSortPrepare(PlanState *pstate) > > +{ > > ... > > + for (;;) > > + { > > ... > > + tuplesort_puttupleslot(state->batches[hash%node->numBatches], slot); > > + } > > + > > + for (i=node->numBatches;i>0;) > > + tuplesort_performsort(state->batches[--i]); > > +build_already_done_: > > + if (parallel) > > + { > > + for (i=node->numBatches;i>0;) > > + { > > + --i; > > + if (state->batches[i]) > > + { > > + tuplesort_end(state->batches[i]); > > + state->batches[i] = NULL; > > + } > > + } > > > > I did not understand this part, that once each worker has performed > > their local batch-wise sort why we are clearing the baches? I mean > > individual workers have their on batches so eventually they supposed > > to get merged. Can you explain this part and also it will be better > > if you can add the comments. > > I think I got this, IIUC, each worker is initializing the shared > short and performing the batch-wise sorting and we will wait on a > barrier so that all the workers can finish with their sorting. Once > that is done the workers will coordinate and pick the batch by batch > and perform the final merge for the batch.
Yes, it is. Each worker open the shared sort as "worker" (nodeBatchSort.c:134), end of all worker performing, pick one batch and open it as "leader"(nodeBatchSort.c:54).