> If I understood correctly, the tuples emitted by Parallel Batch Sort > in each process are ordered by (hash(key, ...) % npartitions, key, > ...), but the path is claiming to be ordered by (key, ...), no? > That's enough for Unique and Aggregate to give the correct answer, > because they really only require equal keys to be consecutive (and in > the same process), but maybe some other plan could break?
The path not claiming to be ordered by (key, ...), the path save PathKey(s) in BatchSortPath::batchkeys, not Path::pathkeys. I don't understand "but maybe some other plan could break", mean some on path using this path? no, BathSortPath on for some special path(Unique, GroupAgg ...). bu...@sohu.com From: Thomas Munro Date: 2020-10-21 12:27 To: bu...@sohu.com CC: pgsql-hackers Subject: Re: parallel distinct union and aggregate support patch On Tue, Oct 20, 2020 at 3:49 AM bu...@sohu.com <bu...@sohu.com> wrote: > I write a path for soupport parallel distinct, union and aggregate using > batch sort. > steps: > 1. generate hash value for group clauses values, and using mod hash value > save to batch > 2. end of outer plan, wait all other workers finish write to batch > 3. echo worker get a unique batch number, call tuplesort_performsort() > function finish this batch sort > 4. return row for this batch > 5. if not end of all batchs, got step 3 > > BatchSort paln make sure same tuple(group clause) return in same range, so > Unique(or GroupAggregate) plan can work. Hi! Interesting work! In the past a few people have speculated about a Parallel Repartition operator that could partition tuples a bit like this, so that each process gets a different set of partitions. Here you combine that with a sort. By doing both things in one node, you avoid a lot of overheads (writing into a tuplestore once in the repartitioning node, and then once again in the sort node, with tuples being copied one-by-one between the two nodes). If I understood correctly, the tuples emitted by Parallel Batch Sort in each process are ordered by (hash(key, ...) % npartitions, key, ...), but the path is claiming to be ordered by (key, ...), no? That's enough for Unique and Aggregate to give the correct answer, because they really only require equal keys to be consecutive (and in the same process), but maybe some other plan could break?