* Tom Lane (t...@sss.pgh.pa.us) wrote: > But today I thought of another way: suppose that we teach the postmaster > to commit hara-kiri if the $PGDATA directory goes away. Since the > buildfarm script definitely does remove all the temporary data directories > it creates, this ought to get the job done.
Yes, please. > An easy way to do that would be to have it check every so often if > pg_control can still be read. We should not have it fail on ENFILE or > EMFILE, since that would create a new failure hazard under heavy load, > but ENOENT or similar would be reasonable grounds for deciding that > something is horribly broken. (At least on Windows, failing on EPERM > doesn't seem wise either, since we've seen antivirus products randomly > causing such errors.) Sounds pretty reasonable to me. > I wouldn't want to do this every time through the postmaster's main loop, > but we could do this once an hour for no added cost by adding the check > where it does TouchSocketLockFiles; or once every few minutes if we > carried a separate variable like last_touch_time. Once an hour would be > plenty to fix the buildfarm's problem, I should think. I have a bad (?) habit of doing exactly this during development and would really like it to be a bit more often than once/hour, unless there's a particular problem with that. > Another question is what exactly "commit hara-kiri" should consist of. > We could just abort() or _exit(1) and leave it to child processes to > notice that the postmaster is gone, or we could make an effort to clean > up. I'd be a bit inclined to treat it like a SIGQUIT situation, ie > kill all the children and exit. The children are probably having > problems of their own if the data directory's gone, so forcing > termination might be best to keep them from getting stuck. I like the idea of killing all the children and then exiting. > Also, perhaps we'd only enable this behavior in --enable-cassert builds, > to avoid any risk of a postmaster incorrectly choosing to suicide in a > production scenario. Or maybe that's overly conservative. That would work for my use-case. Perhaps only on --enable-cassert builds for back-branches but enable it in master and see how things go for 9.6? I agree that it feels overly conservative, but given our recent history, we should be overly cautious with the back branches. > Thoughts? Thanks! Stephen
signature.asc
Description: Digital signature