Hari Babu <haribabu.ko...@huawei.com> writes:
>> We're going to need more details about how to reproduce this.

> The problem occurs only when active server is restarting by just adding a
> recovery.conf file to the data directory. 

Well, you can't just put an empty file there, but I eventually managed
to reproduce this with the suggested hack in xlog.c.

I think the key problem is that postmaster.c's sigusr1_handler() is
willing to start new children even after shutdown has been initiated.
I don't see any good reason for it to do that, so I think the
appropriate patch is as attached.

Changing that still leaves us with the postmaster thinking that the
eventual exit(1) of the startup process is a "crash".  This is mostly
cosmetic since it still shuts down okay, but we can fix it by reversing
the order of the first two checks in reaper() --- that is, if Shutdown
is set, we should prefer that code path even if we're in PM_STARTUP
state.

I concluded that it probably wasn't a good idea to have the additional
state transition in SIGINT handling.  Generally PM_STARTUP means "we're
running the startup process and nothing else", and that's useful state
info that we shouldn't throw away lightly.


                        regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b223feefbab0645667449f643c6c8adee3747ef0..6f93d93fa3f7577fb9157f0bea805c427e3605dd 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** pmdie(SIGNAL_ARGS)
*** 2261,2269 ****
  			if (pmState == PM_RECOVERY)
  			{
  				/*
! 				 * Only startup, bgwriter, and checkpointer should be active
! 				 * in this state; we just signaled the first two, and we don't
! 				 * want to kill checkpointer yet.
  				 */
  				pmState = PM_WAIT_BACKENDS;
  			}
--- 2261,2269 ----
  			if (pmState == PM_RECOVERY)
  			{
  				/*
! 				 * Only startup, bgwriter, walreceiver, and/or checkpointer
! 				 * should be active in this state; we just signaled the first
! 				 * three, and we don't want to kill checkpointer yet.
  				 */
  				pmState = PM_WAIT_BACKENDS;
  			}
*************** reaper(SIGNAL_ARGS)
*** 2355,2360 ****
--- 2355,2372 ----
  			StartupPID = 0;
  
  			/*
+ 			 * Startup process exited in response to a shutdown request (or it
+ 			 * completed normally regardless of the shutdown request).
+ 			 */
+ 			if (Shutdown > NoShutdown &&
+ 				(EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
+ 			{
+ 				pmState = PM_WAIT_BACKENDS;
+ 				/* PostmasterStateMachine logic does the rest */
+ 				continue;
+ 			}
+ 
+ 			/*
  			 * Unexpected exit of startup process (including FATAL exit)
  			 * during PM_STARTUP is treated as catastrophic. There are no
  			 * other processes running yet, so we can just exit.
*************** reaper(SIGNAL_ARGS)
*** 2369,2386 ****
  			}
  
  			/*
- 			 * Startup process exited in response to a shutdown request (or it
- 			 * completed normally regardless of the shutdown request).
- 			 */
- 			if (Shutdown > NoShutdown &&
- 				(EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
- 			{
- 				pmState = PM_WAIT_BACKENDS;
- 				/* PostmasterStateMachine logic does the rest */
- 				continue;
- 			}
- 
- 			/*
  			 * After PM_STARTUP, any unexpected exit (including FATAL exit) of
  			 * the startup process is catastrophic, so kill other children,
  			 * and set RecoveryError so we don't try to reinitialize after
--- 2381,2386 ----
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4283,4289 ****
  	 * first. We don't want to go back to recovery in that case.
  	 */
  	if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
! 		pmState == PM_STARTUP)
  	{
  		/* WAL redo has started. We're out of reinitialization. */
  		FatalError = false;
--- 4283,4289 ----
  	 * first. We don't want to go back to recovery in that case.
  	 */
  	if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
! 		pmState == PM_STARTUP && Shutdown == NoShutdown)
  	{
  		/* WAL redo has started. We're out of reinitialization. */
  		FatalError = false;
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4300,4306 ****
  		pmState = PM_RECOVERY;
  	}
  	if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
! 		pmState == PM_RECOVERY)
  	{
  		/*
  		 * Likewise, start other special children as needed.
--- 4300,4306 ----
  		pmState = PM_RECOVERY;
  	}
  	if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
! 		pmState == PM_RECOVERY && Shutdown == NoShutdown)
  	{
  		/*
  		 * Likewise, start other special children as needed.
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4331,4337 ****
  		signal_child(SysLoggerPID, SIGUSR1);
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER))
  	{
  		/*
  		 * Start one iteration of the autovacuum daemon, even if autovacuuming
--- 4331,4338 ----
  		signal_child(SysLoggerPID, SIGUSR1);
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
! 		Shutdown == NoShutdown)
  	{
  		/*
  		 * Start one iteration of the autovacuum daemon, even if autovacuuming
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4345,4351 ****
  		start_autovac_launcher = true;
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER))
  	{
  		/* The autovacuum launcher wants us to start a worker process. */
  		StartAutovacuumWorker();
--- 4346,4353 ----
  		start_autovac_launcher = true;
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
! 		Shutdown == NoShutdown)
  	{
  		/* The autovacuum launcher wants us to start a worker process. */
  		StartAutovacuumWorker();
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4354,4360 ****
  	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
  		WalReceiverPID == 0 &&
  		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
! 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))
  	{
  		/* Startup Process wants us to start the walreceiver process. */
  		WalReceiverPID = StartWalReceiver();
--- 4356,4363 ----
  	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
  		WalReceiverPID == 0 &&
  		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
! 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
! 		Shutdown == NoShutdown)
  	{
  		/* Startup Process wants us to start the walreceiver process. */
  		WalReceiverPID = StartWalReceiver();
-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to