Re: [HACKERS] Tuplesort merge pre-reading

Heikki Linnakangas Thu, 15 Sep 2016 13:52:21 -0700

On 09/15/2016 10:12 PM, Peter Geoghegan wrote:

On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

Spotted another issue with this code just now. Shouldn't it be based
on state->tapeRange? You don't want the destTape to get memory, since
you don't use batch memory for tapes that are written to (and final
on-the-fly merges don't use their destTape at all).

Wait, you're using a local variable maxTapes here, which potentially
differs from state->maxTapes:

I changed that so that it does actually change state->maxTapes. I considered
having a separate numTapes field, that can be smaller than maxTapes, but we
don't need the original maxTapes value after that point anymore, so it
would've been just pro forma to track them separately. I hope the comment
now explains that better.


I still don't get why you're doing all of this within mergeruns() (the
beginning of when we start merging -- we merge all quicksorted runs),
rather than within beginmerge() (the beginning of one particular merge
pass, of which there are potentially more than one). As runs are
merged in a non-final merge pass, fewer tapes will remain active for
the next merge pass. It doesn't do to do all that up-front when we
have multiple merge passes, which will happen from time to time.

Now that the pre-reading is done in logtape.c, it doesn't stop at a runboundary. For example, when we read the last 1 MB of the first run on atape, and we're using a 10 MB read buffer, we will merrily also read thefirst 9 MB from the next run. You cannot un-read that data, even if thetape is inactive in the next merge pass.

I don't think it makes much difference in practice, because most mergepasses use all, or almost all, of the available tapes. BTW, I think thepolyphase algorithm prefers to do all the merges that don't use alltapes upfront, so that the last final merge always uses all the tapes.I'm not 100% sure about that, but that's my understanding of thealgorithm, and that's what I've seen in my testing.

Correct me if I'm wrong, but I think that you're more skeptical of the
need for polyphase merge than I am. I at least see no reason to not
keep it around. I also think it has some value. It doesn't make this
optimization any harder, really.


We certainly still need multi-pass merges.

BTW, does a 1-way merge make any sense? I was surprised to see this inthe log, even without this patch:


LOG:  finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec