On Fri, Oct 3, 2014 at 1:09 PM, Robert Haas <robertmh...@gmail.com> wrote: > Further debugging reveals that sigusr1_handler() gets called > repeatedly, to start autovacuum workers, and it keeps waking up and > starting them. But that doesn't cause the background workers to get > started either, because although sigusr1_handler() contains a call to > maybe_start_bgworker, it only does that if start_bgworker = true, > which only happens if > CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE) is true, > which on these calls it isn't. > And I think this might also be the missing ingredient answering > Andres's question from before: why doesn't the 60-second > select()-timeout cause the background worker to eventually start even > if the SELECT doesn't get interrupted? There seems to be a SIGUSR1 > arriving about every 3 seconds, and I bet that's resetting the timer > every time.
For now I propose to address this by committing the attached patch, which gets rid of the separate start_bgworker flag inside sigusr1_handler() and instead uses the same StartWorkerNeeded || HaveCrashedWorker test that we use inside ServerLoop() to decide whether to call maybe_start_bgworker(). Either more signals will arrive (in which case the signal handler will launch an additional background worker every time a signal arrives) or they won't (in which case the 60-second timeout will eventually expire, and ServerLoop() will kick into high gear and satisfy all outstanding requests). This isn't really right, because there might still be a quite noticeable delay waiting for workers to get launched, but at least the delay would be bounded to at most 60 seconds rather than, as at present, potentially infinite. A totally correct fix will require a bit more thought. A search of the git history reveals that the problem of a signal restarting the timeout is not new: Tom fixed a similar problem back in 2007 by making the autovacuum launcher sleep for at most a second at a time. Such a fix isn't ideal here, because we really don't want an up-to-1-second delay launching a newly-registered background worker if there's a way to avoid that -- it's probably OK for launching daemons, but it's not so hot for parallel query. However, we could: (1) Use the self-pipe trick. We could not directly use a latch, at least not without a new API, because we might be waiting on more than one socket. (2) Have the postmaster not set SA_RESTART for the sigusr1 handler. I don't know how platform-independent this approach would be. (3) Have sigusr1_handler repeatedly call maybe_start_bgworker() until StartWorkerNeeded becomes false, instead of just calling it once. ServerLoop() is carefully coded to call maybe_start_bgworker() just once per iteration, presumably to make sure the server stays responsive even if the bgworker-starting machinery is quite busy; looping inside the signal handler would give up that nice property unless we had some way to break out of the loop if there's activity on the socket. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index ce920ab..6220a8e 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -4752,7 +4752,6 @@ static void sigusr1_handler(SIGNAL_ARGS) { int save_errno = errno; - bool start_bgworker = false; PG_SETMASK(&BlockSig); @@ -4760,7 +4759,7 @@ sigusr1_handler(SIGNAL_ARGS) if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE)) { BackgroundWorkerStateChange(); - start_bgworker = true; + StartWorkerNeeded = true; } /* @@ -4801,10 +4800,10 @@ sigusr1_handler(SIGNAL_ARGS) pmState = PM_HOT_STANDBY; /* Some workers may be scheduled to start now */ - start_bgworker = true; + StartWorkerNeeded = true; } - if (start_bgworker) + if (StartWorkerNeeded || HaveCrashedWorker) maybe_start_bgworker(); if (CheckPostmasterSignal(PMSIGNAL_WAKEN_ARCHIVER) &&
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers