Hari Babu <haribabu.ko...@huawei.com> writes: >> We're going to need more details about how to reproduce this.
> The problem occurs only when active server is restarting by just adding a > recovery.conf file to the data directory. Well, you can't just put an empty file there, but I eventually managed to reproduce this with the suggested hack in xlog.c. I think the key problem is that postmaster.c's sigusr1_handler() is willing to start new children even after shutdown has been initiated. I don't see any good reason for it to do that, so I think the appropriate patch is as attached. Changing that still leaves us with the postmaster thinking that the eventual exit(1) of the startup process is a "crash". This is mostly cosmetic since it still shuts down okay, but we can fix it by reversing the order of the first two checks in reaper() --- that is, if Shutdown is set, we should prefer that code path even if we're in PM_STARTUP state. I concluded that it probably wasn't a good idea to have the additional state transition in SIGINT handling. Generally PM_STARTUP means "we're running the startup process and nothing else", and that's useful state info that we shouldn't throw away lightly. regards, tom lane
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index b223feefbab0645667449f643c6c8adee3747ef0..6f93d93fa3f7577fb9157f0bea805c427e3605dd 100644 *** a/src/backend/postmaster/postmaster.c --- b/src/backend/postmaster/postmaster.c *************** pmdie(SIGNAL_ARGS) *** 2261,2269 **** if (pmState == PM_RECOVERY) { /* ! * Only startup, bgwriter, and checkpointer should be active ! * in this state; we just signaled the first two, and we don't ! * want to kill checkpointer yet. */ pmState = PM_WAIT_BACKENDS; } --- 2261,2269 ---- if (pmState == PM_RECOVERY) { /* ! * Only startup, bgwriter, walreceiver, and/or checkpointer ! * should be active in this state; we just signaled the first ! * three, and we don't want to kill checkpointer yet. */ pmState = PM_WAIT_BACKENDS; } *************** reaper(SIGNAL_ARGS) *** 2355,2360 **** --- 2355,2372 ---- StartupPID = 0; /* + * Startup process exited in response to a shutdown request (or it + * completed normally regardless of the shutdown request). + */ + if (Shutdown > NoShutdown && + (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus))) + { + pmState = PM_WAIT_BACKENDS; + /* PostmasterStateMachine logic does the rest */ + continue; + } + + /* * Unexpected exit of startup process (including FATAL exit) * during PM_STARTUP is treated as catastrophic. There are no * other processes running yet, so we can just exit. *************** reaper(SIGNAL_ARGS) *** 2369,2386 **** } /* - * Startup process exited in response to a shutdown request (or it - * completed normally regardless of the shutdown request). - */ - if (Shutdown > NoShutdown && - (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus))) - { - pmState = PM_WAIT_BACKENDS; - /* PostmasterStateMachine logic does the rest */ - continue; - } - - /* * After PM_STARTUP, any unexpected exit (including FATAL exit) of * the startup process is catastrophic, so kill other children, * and set RecoveryError so we don't try to reinitialize after --- 2381,2386 ---- *************** sigusr1_handler(SIGNAL_ARGS) *** 4283,4289 **** * first. We don't want to go back to recovery in that case. */ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) && ! pmState == PM_STARTUP) { /* WAL redo has started. We're out of reinitialization. */ FatalError = false; --- 4283,4289 ---- * first. We don't want to go back to recovery in that case. */ if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) && ! pmState == PM_STARTUP && Shutdown == NoShutdown) { /* WAL redo has started. We're out of reinitialization. */ FatalError = false; *************** sigusr1_handler(SIGNAL_ARGS) *** 4300,4306 **** pmState = PM_RECOVERY; } if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) && ! pmState == PM_RECOVERY) { /* * Likewise, start other special children as needed. --- 4300,4306 ---- pmState = PM_RECOVERY; } if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) && ! pmState == PM_RECOVERY && Shutdown == NoShutdown) { /* * Likewise, start other special children as needed. *************** sigusr1_handler(SIGNAL_ARGS) *** 4331,4337 **** signal_child(SysLoggerPID, SIGUSR1); } ! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER)) { /* * Start one iteration of the autovacuum daemon, even if autovacuuming --- 4331,4338 ---- signal_child(SysLoggerPID, SIGUSR1); } ! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) && ! Shutdown == NoShutdown) { /* * Start one iteration of the autovacuum daemon, even if autovacuuming *************** sigusr1_handler(SIGNAL_ARGS) *** 4345,4351 **** start_autovac_launcher = true; } ! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER)) { /* The autovacuum launcher wants us to start a worker process. */ StartAutovacuumWorker(); --- 4346,4353 ---- start_autovac_launcher = true; } ! if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) && ! Shutdown == NoShutdown) { /* The autovacuum launcher wants us to start a worker process. */ StartAutovacuumWorker(); *************** sigusr1_handler(SIGNAL_ARGS) *** 4354,4360 **** if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) && WalReceiverPID == 0 && (pmState == PM_STARTUP || pmState == PM_RECOVERY || ! pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY)) { /* Startup Process wants us to start the walreceiver process. */ WalReceiverPID = StartWalReceiver(); --- 4356,4363 ---- if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) && WalReceiverPID == 0 && (pmState == PM_STARTUP || pmState == PM_RECOVERY || ! pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) && ! Shutdown == NoShutdown) { /* Startup Process wants us to start the walreceiver process. */ WalReceiverPID = StartWalReceiver();
-- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs