On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund <and...@anarazel.de> wrote: > On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote: >> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: >> > Yikes, that's a pretty absurd implementation. >> >> Not when you take into account that it's been written over 20 years ago ;) > > Well, that doesn't mean it can't be fixed ;) > >> > I'm not quite sure I understand why this an issue here - there shouldn't >> > ever be events on this fd, so why is the kernel waking up all processes? >> > It'd kinda makes sense it'd wake up all processes if there's one >> > waiting, but ... ? >> >> Every read is an event, and that's what PostmasterIsAlive does. > > But in most places we only do a PostmasterIsAlive if WaitLatch returns > WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is > WalRcvWaitForStartPosition(). If that's indeed the cause of your issues > this quite possibly could be fixed by doing the > if (!PostmasterIsAlive()) > exit(1); > check not unconditionally, but only after the WaitLatch at the end of > the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? > That'll be a minor behaviour change for the WALRCV_RESTARTING, but that > seems fine, we'll just loop once more outside (after a quick glance at > least).
Yeah, I wondered why that was different than the pattern established elsewhere when I was hacking on replication code. There are actually several places where we call PostmasterIsAlive() unconditionally in a loop that waits for WL_POSTMASTER_DEATH but ignores the return code: in pgarch.c, syncrep.c, walsender.c and walreceiver.c. Should we just change them all to check the return code and exit/break/ereport/etc as appropriate? That would match the code from autovacuum.c, checkpointer.c, pgstat.c, be-secure.c and bgworker.c. Something like the attached. The code in basebackup.c also waits for WL_POSTMASTER_DEATH but doesn't check for it in the return value *or* call PostmasterIsAlive(). I'm not sure what to make of that. I didn't test it but it looks like maybe it would continue running after postmaster death but not honour the throttling rate limit because WaitLatch would keep returning immediately. -- Thomas Munro http://www.enterprisedb.com
check-return-code-for-postmaster-death.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers