On 09/15/2016 10:12 PM, Peter Geoghegan wrote:
On Wed, Sep 14, 2016 at 10:43 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
Spotted another issue with this code just now. Shouldn't it be based
on state->tapeRange? You don't want the destTape to get memory, since
you don't use batch memory for tapes that are written to (and final
on-the-fly merges don't use their destTape at all).
Wait, you're using a local variable maxTapes here, which potentially
differs from state->maxTapes:
I changed that so that it does actually change state->maxTapes. I considered
having a separate numTapes field, that can be smaller than maxTapes, but we
don't need the original maxTapes value after that point anymore, so it
would've been just pro forma to track them separately. I hope the comment
now explains that better.
I still don't get why you're doing all of this within mergeruns() (the
beginning of when we start merging -- we merge all quicksorted runs),
rather than within beginmerge() (the beginning of one particular merge
pass, of which there are potentially more than one). As runs are
merged in a non-final merge pass, fewer tapes will remain active for
the next merge pass. It doesn't do to do all that up-front when we
have multiple merge passes, which will happen from time to time.
Now that the pre-reading is done in logtape.c, it doesn't stop at a run
boundary. For example, when we read the last 1 MB of the first run on a
tape, and we're using a 10 MB read buffer, we will merrily also read the
first 9 MB from the next run. You cannot un-read that data, even if the
tape is inactive in the next merge pass.
I don't think it makes much difference in practice, because most merge
passes use all, or almost all, of the available tapes. BTW, I think the
polyphase algorithm prefers to do all the merges that don't use all
tapes upfront, so that the last final merge always uses all the tapes.
I'm not 100% sure about that, but that's my understanding of the
algorithm, and that's what I've seen in my testing.
Correct me if I'm wrong, but I think that you're more skeptical of the
need for polyphase merge than I am. I at least see no reason to not
keep it around. I also think it has some value. It doesn't make this
optimization any harder, really.
We certainly still need multi-pass merges.
BTW, does a 1-way merge make any sense? I was surprised to see this in
the log, even without this patch:
LOG: finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 1-way merge step: CPU 0.62s/7.22u sec elapsed 8.43 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 3-way merge step: CPU 0.62s/7.23u sec elapsed 8.44 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.44 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
LOG: finished 6-way merge step: CPU 0.62s/7.24u sec elapsed 8.45 sec
STATEMENT: SELECT COUNT(*) FROM (SELECT * FROM medium.random_ints ORDER
BY i) t;
- Heikki
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers