On 12/21/2016 12:53 AM, Robert Haas wrote:
That leaves one problem, though: reusing space in the final merge phase. If
the tapes being merged belong to different LogicalTapeSets, and create one
new tape to hold the result, the new tape cannot easily reuse the space of
the input tapes because they are on different tape sets.

If the worker is always completely finished with the tape before the
leader touches it, couldn't the leader's LogicalTapeSet just "adopt"
the tape and overwrite it like any other?

Currently, the logical tape code assumes that all tapes in a single LogicalTapeSet are allocated from the same BufFile. The logical tape's on-disk format contains block numbers, to point to the next/prev block of the tape [1], and they're assumed to refer to the same file. That allows reusing space efficiently during the merge. After you have read the first block from tapes A, B and C, you can immediately reuse those three blocks for output tape D.

Now, if you read multiple tapes, from different LogicalTapeSet, hence backed by different BufFiles, you cannot reuse the space from those different tapes for a single output tape, because the on-disk format doesn't allow referring to blocks in other files. You could reuse the space of *one* of the input tapes, by placing the output tape in the same LogicalTapeSet, but not all of them.

We could enhance that, by using "filename + block number" instead of just block number, in the pointers in the logical tapes. Then you could spread one logical tape across multiple files. Probably not worth it in practice, though.


[1] As the code stands, there are no next/prev pointers, but a tree of "indirect" blocks. But I'm planning to change that to simpler next/prev pointers, in https://www.postgresql.org/message-id/flat/55b3b7ae-8dec-b188-b8eb-e07604052351%40iki.fi

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to