On Thu, Jan 18, 2018 at 9:22 AM, Robert Haas <robertmh...@gmail.com> wrote: > But, hang on a minute. Why do the workers need to wait for the leader > anyway? Can't they just exit once they're done sorting? I think the > original reason why this ended up in the patch is that we needed the > leader to assume ownership of the tapes to avoid having the tapes get > blown away when the worker exits. But, IIUC, with sharedfileset.c, > that problem no longer exists. The tapes are jointly owned by all of > the cooperating backends and the last one to detach from it will > remove them. So, if the worker sorts, advertises that it's done in > shared memory, and exits, then nothing should get removed and the > leader can adopt the tapes whenever it gets around to it.
BTW, I want to point out that using the shared fileset infrastructure is only a very small impediment to adding randomAccess support. If we really wanted to support randomAccess for the leader's tapeset, while recycling blocks from worker BufFiles, it looks like all we'd have to do is change PathNameOpenTemporaryFile() to open files O_RDWR, rather than O_RDONLY (shared fileset BufFiles that are opened after export always have O_RDONLY segments -- we'd also have to change some assertions, as well as some comments). Overall, this approach looks straightforward, and isn't something that I can find an issue with after an hour or so of manual testing. Now, I'm not actually suggesting we go that way. As you know, randomAccess isn't used by CREATE INDEX, and randomAccess may never be needed for any parallel sort operation. More importantly, Thomas made PathNameOpenTemporaryFile() use O_RDONLY for a reason, and I don't want to trade one special case (randomAccess disallowed for parallel tuplesort leader tapeset) in exchange for another one (the logtape.c calls to BufFileOpenShared() ask for read-write BufFiles, not read-only BufFiles). I'm pointing this out because this is something that should increase confidence in the changes I've proposed to logtape.c. The fact that randomAccess support *would* be straightforward is a sign that I haven't accidentally introduced some other assumption, or special case. -- Peter Geoghegan