On Thu, Sep 29, 2016 at 10:49 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: >> Do I have that right? If so, this seems rather awkward. Hmm. > > > How is it awkward?
Maybe that was the wrong choice of words. What I mean is that it seems somewhat unprincipled to give over an equal share of memory to each active-at-least-once tape, regardless of how much that matters in practice. One tape could have several runs in memory at once, while another only has a fraction of a single much larger run. Maybe this is just the first time that I find myself on the *other* side of a discussion about an algorithm that seems brute-force compared to what it might replace, but is actually better overall. :-) Now, that could be something that I just need to get over. In any case, I still think: * Variables like maxTapes have a meaning that is directly traceable back to Knuth's description of polyphase merge. I don't think that you should do anything to them, on general principle. * Everything or almost everything that you've added to mergeruns() should probably be in its own dedicated function. This function can have a comment where you acknowledge that it's not perfectly okay that you claim memory per-tape, but it's simpler and faster overall. * I think you should be looking at the number of active tapes, and not state->Level or state->currentRun in this new function. Actually, maybe this wouldn't be the exact definition of an active tape that we establish at the beginning of beginmerge() (this considers tapes with dummy runs to be inactive for that merge), but it would look similar. The general concern I have here is that you shouldn't rely on state->Level or state->currentRun from a distance for the purposes of determining which tapes need some batch memory. This is also where you might say something like: "we don't bother with shifting memory around tapes for each merge step, even though that would be fairer. That's why we don't use the beginmerge() definition of activeTapes -- instead, we use our own broader definition of the number of active tapes that doesn't exclude dummy-run-tapes with real runs, making the allocation reasonably sensible for every merge pass". * The "batchUsed" terminology really isn't working here, AFAICT. For one thing, you have two separate areas where caller tuples might reside: The small per-tape buffers (sized MERGETUPLEBUFFER_SIZE per tape), and the logtape.c controlled preread buffers (sized up to MaxAllocSize per tape). Which of these two things is batch memory? I think it might just be the first one, but KiBs of memory do not suggest "batch" to me. Isn't that really more like what you might call double buffer memory, used not to save overhead from palloc (having many palloc headers in memory), but rather to recycle memory efficiently? So, these two things should have two new names of their own, I think, and neither should be called "batch memory" IMV. I see several assertions remain here and there that were written with my original definition of batch memory in mind -- assertions like: case TSS_SORTEDONTAPE: Assert(forward || state->randomAccess); Assert(!state->batchUsed); (Isn't state->batchUsed always true now?) And: case TSS_FINALMERGE: Assert(forward); Assert(state->batchUsed || !state->tuples); (Isn't state->tuples only really of interest to datum-tuple-case routines, now that you've made everything use large logtape.c preread buffers?) * Is is really necessary for !state->tuples cases to do that MERGETUPLEBUFFER_SIZE-based allocation? In other words, what need is there for pass-by-value datum cases to have this scratch/double buffer memory at all, since the value is returned to caller by-value, not by-reference? This is related to the problem of it not being entirely clear what batch memory now is, I think. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers