Re: [HACKERS] Using quicksort for every external sort run

Peter Geoghegan Sun, 13 Dec 2015 15:42:07 -0800

On Sat, Dec 12, 2015 at 4:41 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> Those usages make sense to me, as they are locally self-contained and
> it is clear what they are in contradistinction to.   But your usage is
> spread throughout (even in function names, not just comments) and
> seems to contradict the current usage as yours are not separately
> palloced, as the "proper" ones described here are.  I think that
> "proper" only works when the same comment also defines the
> alternative, rather than as some file-global description.  Maybe
> "pooltuple" rather than "tupleproper"


I don't think of it that way. The "tuple proper" is the thing that the
client passes to their tuplesort -- the thing they are actually
interested in having sorted. Like an IndexTuple for CREATE INDEX
callers, for example. SortTuple is just an internal implementation
detail. (That appears all over the file tuplesort.c, just as my new
references to "tuple proper" do. But neither appear elsewhere.)

>>> Also, if I am reading this correctly, when we refill a pool from a
>>> logical tape we still transform each tuple as it is read from the disk
>>> format to the memory format.  This inflates the size quite a bit, at
>>> least for single-datum tuples.  If we instead just read the disk
>>> format directly into the pool, and converted them into the in-memory
>>> format when each tuple came due for the merge heap, would that destroy
>>> the locality of reference you are seeking to gain?
>>
>> Are you talking about alignment?
>
> Maybe alignment, but also the size of the SortTuple struct itself,
> which is not present on tape but is present in memory if I understand
> correctly.
>
> When reading 128kb (32 blocks) worth of in-memory pool, it seems like
> it only gets to read 16 to 18 blocks of tape to fill them up, in the
> case of building an index on single column 32-byte random md5 digests.
> I don't exactly know where all of that space goes, I'm taking an
> experimentalist approach.

I'm confused.

readtup_datum(), just like every other READTUP() variant, has the new
function tupproperalloc() as a drop-in replacement for the master
branch palloc() + USEMEM() calls.

It is true that tupproperalloc() (and a couple of other places
relating to preloading) know *a little* about the usage pattern --
tupproperalloc() accepts a "tape number" argument to know what
partition within the large pool/buffer to use for each logical
allocation. However, from the point of view of correctness,
tupproperalloc() should function as a drop-in replacement for palloc()
+ USEMEM() calls in the context of the various READTUP() routines.

I have done nothing special with any particular READTUP() routine,
including readtup_datum() (all READTUP() routines have received the
same treatment). Nothing else was changed in those routines, including
how tuples are stored on tape. The datum case does kind of store the
SortTuples on tape today in one very limited sense, which is that the
length is stored fairly naively (that's already available from the
IndexTuple in the case of writetup_index(), for example, but length
must be stored explicitly for the datum case).

My guess is you're confusion comes from the fact that the memtuples
array (the array of SortTuple) is also factored in to memory
accounting, but that grows at geometric intervals, whereas the
existing READTUP() retail palloc() calls (and their USEMEM() memory
accounting calls) occur in drips and drabs. It's probably the case
that the sizing of the memtuples array and the amount of memory we use
for that rather than retail palloc()/"tuple proper" memory is a kind
of arbitrary (why should the needs be the same when SortTuples are
merge step "slots"?), but I don't think that's the biggest problem in
this general area at all.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

Reply via email to