Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-12 Thread Robert Haas
On Wed, Aug 10, 2016 at 4:54 PM, Peter Geoghegan wrote: > On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: >> My view on this - currently anyway - is that we shouldn't conflate the >> tuplesort with the subsequent index generation, but that we should try >> to use parallelism within the tuple

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-10 Thread Peter Geoghegan
On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: > I think that last part is a very important property; my intuition is > that dividing up the work between cooperating processes in a way that > should come out equal will often fail to do so, either due to the > operating system scheduler or du

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-10 Thread Peter Geoghegan
On Wed, Aug 10, 2016 at 12:08 PM, Claudio Freire wrote: > I think it's a great design, but for that, per-worker final tapes have > to always be random-access. Thanks. I don't think I need to live with the randomAccess restriction, because I can be clever about reading only the first tuple on each

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-10 Thread Peter Geoghegan
On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas wrote: > My view on this - currently anyway - is that we shouldn't conflate the > tuplesort with the subsequent index generation, but that we should try > to use parallelism within the tuplesort itself to the greatest extent > possible. If there is a

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-10 Thread Claudio Freire
On Mon, Aug 8, 2016 at 4:44 PM, Peter Geoghegan wrote: > The basic idea I have in mind is that we create runs in workers in the > same way that the parallel CREATE INDEX patch does (one output run per > worker). However, rather than merging in the leader, we use a > splitting algorithm to determin

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-10 Thread Robert Haas
On Mon, Aug 8, 2016 at 3:44 PM, Peter Geoghegan wrote: > I don't think partitioning is urgent for CREATE INDEX, and may be > inappropriate for CREATE INDEX under any circumstances, because: > > * Possible problems with parallel infrastructure and writes. > * Unbalanced B-Trees (or the risk thereof

[HACKERS] Parallel tuplesort, partitioning, merging, and the future

2016-08-08 Thread Peter Geoghegan
Over on the "Parallel tuplesort (for parallel B-Tree index creation)" thread [1], there has been some discussion of merging vs. partitioning. There is a concern about the fact the merge of the tuplesort used to build a B-Tree is not itself parallelized. There is a weak consensus that we'd be better