Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Heikki Linnakangas Wed, 21 Dec 2016 04:05:25 -0800

On 12/21/2016 12:53 AM, Robert Haas wrote:

That leaves one problem, though: reusing space in the final merge phase. If
the tapes being merged belong to different LogicalTapeSets, and create one
new tape to hold the result, the new tape cannot easily reuse the space of
the input tapes because they are on different tape sets.


If the worker is always completely finished with the tape before the
leader touches it, couldn't the leader's LogicalTapeSet just "adopt"
the tape and overwrite it like any other?

Currently, the logical tape code assumes that all tapes in a singleLogicalTapeSet are allocated from the same BufFile. The logical tape'son-disk format contains block numbers, to point to the next/prev blockof the tape [1], and they're assumed to refer to the same file. Thatallows reusing space efficiently during the merge. After you have readthe first block from tapes A, B and C, you can immediately reuse thosethree blocks for output tape D.

Now, if you read multiple tapes, from different LogicalTapeSet, hencebacked by different BufFiles, you cannot reuse the space from thosedifferent tapes for a single output tape, because the on-disk formatdoesn't allow referring to blocks in other files. You could reuse thespace of *one* of the input tapes, by placing the output tape in thesame LogicalTapeSet, but not all of them.

We could enhance that, by using "filename + block number" instead ofjust block number, in the pointers in the logical tapes. Then you couldspread one logical tape across multiple files. Probably not worth it inpractice, though.

[1] As the code stands, there are no next/prev pointers, but a tree of"indirect" blocks. But I'm planning to change that to simpler next/prevpointers, inhttps://www.postgresql.org/message-id/flat/55b3b7ae-8dec-b188-b8eb-e07604052351%40iki.fi


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Reply via email to