On 4 July 2011 16:53, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > Ok, here's a new patch, addressing the issues Fujii raised, and with a bunch > of stylistic changes of my own. Also, I committed a patch to remove > silent_mode, so the fork_process() changes are now gone. I'm going to sleep > over this and review once again tomorrow, and commit if it still looks good > to me and no-one else reports new issues.
Looks good. > I don't like the names POSTMASTER_FD_WATCH and POSTMASTER_FD_OWN. At a quick > glance, it's not at all clear which is which. I couldn't come up with better > names, so for now I just added some comments to clarify that. I would find > WRITE/READ more clear, but to make sense of that you need to how the pipe is > used. Any suggestions or opinions on that? We could bikeshed about that until the cows come home, but we're not going to come up with names that make the purpose of each evident at a glance - it's too involved. If we could, we would have thought of them already. Besides, I've probably already written all the client code those macros will ever have. On 4 July 2011 17:36, Florian Pflug <f...@phlo.org> wrote: > On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote: >>> Under Linux, select() may report a socket file descriptor as "ready >>> for >>> reading", while nevertheless a subsequent read blocks. This could >>> for >>> example happen when data has arrived but upon examination has >>> wrong >>> checksum and is discarded. There may be other circumstances in which >>> a >>> file descriptor is spuriously reported as ready. Thus it may be >>> safer >>> to use O_NONBLOCK on sockets that should not block. >> >> So in theory, on Linux you might WaitLatch might sometimes incorrectly >> return WL_POSTMASTER_DEATH. None of the callers check for >> WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before >> assuming the postmaster has died, so that won't affect correctness at the >> moment. I doubt that scenario can even happen in our case, select() on a >> pipe that is never written to. But maybe we should add add an assertion to >> WaitLatch to assert that if select() reports that the postmaster pipe has >> been closed, PostmasterIsAlive() also returns false. > > The correct solution would be to read() from the pipe after select() > returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return > EAGAIN. To prevent that read() from blocking if the read event was indeed > spurious, O_NONBLOCK must be set on the pipe but that patch does that already. Let's have some perspective on this. We're talking about a highly doubtful chance that latches may wake when they shouldn't. Latches are typically expected to wake up for a variety of reasons, and if that occurred in the archiver's case with my patch applied, I think we'd just go asleep again without anything happening. It seems likely that latch client code in general will never trip up on this, as long as its not exclusively relying on the waitLatch() return value to report pm death. Maybe we should restore the return value of WaitLatch to its previous format (so it doesn't return a bitmask)? That way, we don't report that the Postmaster died, and therefore clients are required to call PostmasterIsAlive() to be sure. In any case, I'm in favour of the assertion. > Btw, with the death-watch / life-sign / whatever infrastructure in place, > shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)? Hmm, maybe. That seems like a separate issue though, that can be addressed with another patch. It does have the considerable disadvantage of making Heikki's proposed assertion failure useless. Is the implementation of PostmasterIsAlive() really a problem at the moment? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers