On Tue, Dec 20, 2016 at 8:14 PM, Peter Geoghegan <p...@heroku.com> wrote: > Without meaning to sound glib, unification is the process by which > parallel CREATE INDEX has the leader read temp files from workers > sufficient to complete its final on-the-fly merge.
That's not glib, but you can't in the end define BufFile unification in terms of what parallel CREATE INDEX needs. Whatever changes we make to lower-level abstractions in the service of some higher-level goal need to be explainable on their own terms. >>It's fine to make up new words -- indeed, in some sense that is the essence >> of writing any complex problem -- but you have to define them. > > I invite you to help me define this new word. If at some point I'm able to understand what it means, I'll try to do that. I think you're loosely using "unification" to mean combining stuff from different backends in some way that depends on the particular context, so that "BufFile unification" can be different from "LogicalTape unification". But that's just punting the question of what each of those things actually are. > Maybe it's inferior to that, but I think what Heikki proposes is more > or less complementary to what I've proposed, and has nothing to do > with resource management and plenty to do with making the logtape.c > interface look nice, AFAICT. It's also about refactoring/simplifying > logtape.c itself, while we're at it. I believe that Heikki has yet to > comment either way on my approach to resource management, one aspect > of the patch that I was particularly keen on your looking into. My reading of Heikki's point was that there's not much point in touching the BufFile level of things if we can do all of the necessary stuff at the LogicalTape level, and I agree with him about that. If a shared BufFile had a shared read-write pointer, that would be a good justification for having it. But it seems like unification at the BufFile level is just concatenation, and that can be done just as well at the LogicalTape level, so why tinker with BufFile? As I've said, I think there's some low-level hacking needed here to make sure files get removed at the correct time in all cases, but apart from that I see no good reason to push the concatenation operation all the way down into BufFile. > The theory of operation here is that workers own their own BufFiles, > and are responsible for deleting them when they die. The assumption, > rightly or wrongly, is that it's sufficient that workers flush > everything out (write out temp files), and yield control to the > leader, which will open their temp files for the duration of the > leader final on-the-fly merge. The resource manager in the leader > knows it isn't supposed to ever delete worker-owned files (just > close() the FDs), and the leader errors if it cannot find temp files > that match what it expects. If there is a an error in the leader, it > shuts down workers, and they clean up, more than likely. If there is > an error in the worker, or if the files cannot be deleted (e.g., if > there is a classic hard crash scenario), we should also be okay, > because nobody will trip up on some old temp file from some worker, > since fd.c has some gumption about what workers need to do (and what > the leader needs to avoid) in the event of a hard crash. I don't see a > risk of file descriptor leaks, which may or may not have been part of > your concern (please clarify). I don't think there's any new issue with file descriptor leaks here, but I think there is a risk of calling unlink() too early or too late with your design. My proposal was an effort to nail that down real tight. >> If the worker is always completely finished with the tape before the >> leader touches it, couldn't the leader's LogicalTapeSet just "adopt" >> the tape and overwrite it like any other? > > I'll remind you that parallel CREATE INDEX doesn't actually ever need > to be randomAccess, and so we are not actually going to ever need to > do this as things stand. I wrote the code that way in order to not > break the existing interface, which seemed like a blocker to posting > the patch. I am open to the idea of such an "adoption" occurring, even > though it actually wouldn't help any case that exists in the patch as > proposed. I didn't go that far in part because it seemed premature, > given that nobody had looked at my work to date at the time, and given > the fact that there'd be no initial user-visible benefit, and given > how the exact meaning of "unification" was (and is) somewhat in flux. > > I see no good reason to not do that, although that might change if I > actually seriously undertook to teach the leader about this kind of > "adoption". I suspect that the interface specification would make for > confusing reading, which isn't terribly appealing, but I'm sure I > could manage to make it work given time. I think the interface is pretty clear: the worker's logical tapes get incorporated into the leader's LogicalTapeSet as if they'd been there all along. After all, by the time this is happening, IIUC (please confirm), the worker is done with those tapes and will never read or modify them again. If that's right, the worker just needs a way to identify those tapes to the leader, which can then add them to its LogicalTapeSet. That's it. It needs a way to identify them, but I think that shouldn't be hard; in fact, I think your patch has something like that already. And it needs to make sure that the files get removed at the right time, but I already sketched a solution to that problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers