Re: Reorder shutdown sequence, to flush pgstats later

Andres Freund Fri, 24 Jan 2025 12:06:41 -0800

Hi,

Attached is a new version of this patchset:


- HandleFatalError now doesn't expect to be called if already in FatalError or
  ImmediateShutdown and contains a comment and assertions to that effect

- Fatal errors when in pmState > PM_WAIT_BACKENDS go to PM_WAIT_DEAD_END
  directly. The previous version had them go to PM_WAIT_BACKENDS, to save
  duplicating one line. The ConfigurePostmasterWaitSet(false) call is now
  duplicated, with a comment at both sites mentioning that.

- Typo, comment and commit message fixes/polishing.

Unless somebody argues against, I'm planning to push all but the last later
today, wait for the buildfarm to settle, and then push the last.  This is a
dependency of multiple other things, so it'd be good to get it in soon.


On 2025-01-15 08:38:33 +0000, Bertrand Drouvot wrote:
> On Tue, Jan 14, 2025 at 06:55:03PM -0500, Andres Freund wrote:
> > On 2025-01-14 13:06:31 +0000, Bertrand Drouvot wrote:
> > > === 4
> > > 
> > > +  * XXX: Is it worth inventing a different PMQUIT value
> > > +  * that signals that the cluster is in a bad state,
> > > +  * without a process having crashed?
> > >    */
> > > 
> > > That would be more "accurate". Something like PMQUIT_FORK_FAILURE or such.
> > 
> > For now I think I'll leave it as such, as we'd need to have a new message in
> > quickdie(). While the primary message for PMQUIT_FOR_CRASH message would
> > differ, the rest seems like it'd largely be duplicated:
> > 
> >             case PMQUIT_FOR_CRASH:
> >                     /* A crash-and-restart cycle is in progress */
> >                     ereport(WARNING_CLIENT_ONLY,
> >                                     (errcode(ERRCODE_CRASH_SHUTDOWN),
> >                                      errmsg("terminating connection because 
> > of crash of another server process"),
> >                                      errdetail("The postmaster has 
> > commanded this server process to roll back"
> >                                                        " the current 
> > transaction and exit, because another"
> >                                                        " server process 
> > exited abnormally and possibly corrupted"
> >                                                        " shared memory."),
> >                                      errhint("In a moment you should be 
> > able to reconnect to the"
> >                                                      " database and repeat 
> > your command.")));
> >                     break;
> > 
> > That seems like a separate change.  Particularly because I'm just working on
> > this as part of some nasty three layer deep dependency chain - aio is pretty
> > big on its own...
> 
> Fair enough. I'll look at the remaining pieces once this stuff goes in.

Cool. I did try writing the change, but it does indeed seem better as a
separate patch. Besides the error message, it also seems we ought to invent a
different ERRCODE, as neither ERRCODE_ADMIN_SHUTDOWN nor
ERRCODE_CRASH_SHUTDOWN seem appropriate.

Vaguely related and not at all crucial:

quickdie(), in the PMQUIT_FOR_CRASH, does
                                         errhint("In a moment you should be 
able to reconnect to the"
                                                         " database and repeat 
your command.")));

but in case of an immediate *restart* (via pg_ctl restart -m immediate) we'll
not have such hint. postmaster.c doesn't know it's an immediate restart, so
that's not surprising, but it still seems a bit weird. One would hope that
immediate restarts are more common than crashes...


> 
> > > But that would also need to call TerminateChildren() (a second time) 
> > > again after
> > > ConfigurePostmasterWaitSet() which is not ideal though (unless we move the
> > > TerminateChildren() calls in the switch or add a check on 
> > > PM_WAIT_DEAD_END in
> > > the first call).
> > 
> > Why would it need to?
> 
> Because I thought that TerminateChildren() needs to be called after
> ConfigurePostmasterWaitSet(false). If not, could new connections be accepted 
> in
> the window between TerminateChildren() and ConfigurePostmasterWaitSet(false)?

I don't think that can happen with the attached patch - the
ConfigurePostmasterWaitSet() is called before HandleFatalError() returns, so
no new connections can be accepted, as that happens only from
ServerLoop()->AcceptConnection().


> > > === 8
> > > 
> > > +   * Now that everyone important is gone
> > > 
> > > s/everyone important is/walsenders and archiver are also gone/?
> > 
> > I'd rather get away from these lists of specific processes - they end up 
> > being
> > noisy in the git history because the lines get updated to add/remove items 
> > (or
> > updates to them get missed).
> 
> Good point, that's more "difficult" to maintain. Though I'm not sure that
> "important" is the right wording. "Now that non dead processes have finished"
> maybe? Could be seen as a Nit too.

This specific case I just solved by referencing "the processes mentioned
above"...

Greetings,

Andres Freund

>From a611c406af152049b550c45dc77a4678d6fc5514 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Fri, 24 Jan 2025 11:32:03 -0500
Subject: [PATCH v5 1/7] checkpointer: Request checkpoint via latch instead of
 signal

The motivation for this change is that a future commit will use SIGINT for
another purpose (postmaster requesting WAL access to be shut down) and that
there no other signals that we could readily use (see code comment for the
reason why SIGTERM shouldn't be used). But it's also a tad nicer / more
efficient to use SetLatch(), as it avoids sending signals when checkpointer
already is busy.

Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/checkpointer.c | 60 +++++++++------------------
 1 file changed, 19 insertions(+), 41 deletions(-)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 9bfd0fd665c..dd2c8376c6e 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -159,9 +159,6 @@ static bool ImmediateCheckpointRequested(void);
 static bool CompactCheckpointerRequestQueue(void);
 static void UpdateSharedMemoryConfig(void);
 
-/* Signal handlers */
-static void ReqCheckpointHandler(SIGNAL_ARGS);
-
 
 /*
  * Main entry point for checkpointer process
@@ -191,7 +188,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 	 * tell us it's okay to shut down (via SIGUSR2).
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, ReqCheckpointHandler); /* request checkpoint */
+	pqsignal(SIGINT, SIG_IGN);
 	pqsignal(SIGTERM, SIG_IGN); /* ignore SIGTERM */
 	/* SIGQUIT handler was already set up by InitPostmasterChild */
 	pqsignal(SIGALRM, SIG_IGN);
@@ -860,23 +857,6 @@ IsCheckpointOnSchedule(double progress)
 }
 
 
-/* --------------------------------
- *		signal handler routines
- * --------------------------------
- */
-
-/* SIGINT: set flag to run a normal checkpoint right away */
-static void
-ReqCheckpointHandler(SIGNAL_ARGS)
-{
-	/*
-	 * The signaling process should have set ckpt_flags nonzero, so all we
-	 * need do is ensure that our main loop gets kicked out of any wait.
-	 */
-	SetLatch(MyLatch);
-}
-
-
 /* --------------------------------
  *		communication with backends
  * --------------------------------
@@ -990,38 +970,36 @@ RequestCheckpoint(int flags)
 	SpinLockRelease(&CheckpointerShmem->ckpt_lck);
 
 	/*
-	 * Send signal to request checkpoint.  It's possible that the checkpointer
-	 * hasn't started yet, or is in process of restarting, so we will retry a
-	 * few times if needed.  (Actually, more than a few times, since on slow
-	 * or overloaded buildfarm machines, it's been observed that the
-	 * checkpointer can take several seconds to start.)  However, if not told
-	 * to wait for the checkpoint to occur, we consider failure to send the
-	 * signal to be nonfatal and merely LOG it.  The checkpointer should see
-	 * the request when it does start, with or without getting a signal.
+	 * Set checkpointer's latch to request checkpoint.  It's possible that the
+	 * checkpointer hasn't started yet, so we will retry a few times if
+	 * needed.  (Actually, more than a few times, since on slow or overloaded
+	 * buildfarm machines, it's been observed that the checkpointer can take
+	 * several seconds to start.)  However, if not told to wait for the
+	 * checkpoint to occur, we consider failure to set the latch to be
+	 * nonfatal and merely LOG it.  The checkpointer should see the request
+	 * when it does start, with or without the SetLatch().
 	 */
 #define MAX_SIGNAL_TRIES 600	/* max wait 60.0 sec */
 	for (ntries = 0;; ntries++)
 	{
-		if (CheckpointerShmem->checkpointer_pid == 0)
+		volatile PROC_HDR *procglobal = ProcGlobal;
+		ProcNumber	checkpointerProc = procglobal->checkpointerProc;
+
+		if (checkpointerProc == INVALID_PROC_NUMBER)
 		{
 			if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
 			{
 				elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
-					 "could not signal for checkpoint: checkpointer is not running");
-				break;
-			}
-		}
-		else if (kill(CheckpointerShmem->checkpointer_pid, SIGINT) != 0)
-		{
-			if (ntries >= MAX_SIGNAL_TRIES || !(flags & CHECKPOINT_WAIT))
-			{
-				elog((flags & CHECKPOINT_WAIT) ? ERROR : LOG,
-					 "could not signal for checkpoint: %m");
+					 "could not notify checkpoint: checkpointer is not running");
 				break;
 			}
 		}
 		else
-			break;				/* signal sent successfully */
+		{
+			SetLatch(&GetPGProcByNumber(checkpointerProc)->procLatch);
+			/* notified successfully */
+			break;
+		}
 
 		CHECK_FOR_INTERRUPTS();
 		pg_usleep(100000L);		/* wait 0.1 sec, then retry */
-- 
2.48.1.76.g4e746b1a31.dirty

>From 7c643fe209afa70804388888c6be19167f8795be Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Mon, 13 Jan 2025 23:20:25 -0500
Subject: [PATCH v5 2/7] postmaster: Don't open-code TerminateChildren() in
 HandleChildCrash()

After removing the duplication no user of sigquit_child() remains, therefore
remove it.

Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/postmaster.c | 49 +++++++----------------------
 1 file changed, 11 insertions(+), 38 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5488289a489..ea92fb56c80 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -243,6 +243,13 @@ bool		enable_bonjour = false;
 char	   *bonjour_name;
 bool		restart_after_crash = true;
 bool		remove_temp_files_after_crash = true;
+
+/*
+ * When terminating child processes after fatal errors, like a crash of a
+ * child process, we normally send SIGQUIT -- and most other comments in this
+ * file are written on the assumption that we do -- but developers might
+ * prefer to use SIGABRT to collect per-child core dumps.
+ */
 bool		send_abort_for_crash = false;
 bool		send_abort_for_kill = false;
 
@@ -424,7 +431,6 @@ static int	BackendStartup(ClientSocket *client_sock);
 static void report_fork_failure_to_client(ClientSocket *client_sock, int errnum);
 static CAC_state canAcceptConnections(BackendType backend_type);
 static void signal_child(PMChild *pmchild, int signal);
-static void sigquit_child(PMChild *pmchild);
 static bool SignalChildren(int signal, BackendTypeMask targetMask);
 static void TerminateChildren(int signal);
 static int	CountChildren(BackendTypeMask targetMask);
@@ -2701,32 +2707,12 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	/*
 	 * Signal all other child processes to exit.  The crashed process has
 	 * already been removed from ActiveChildList.
+	 *
+	 * We could exclude dead-end children here, but at least when sending
+	 * SIGABRT it seems better to include them.
 	 */
 	if (take_action)
-	{
-		dlist_iter	iter;
-
-		dlist_foreach(iter, &ActiveChildList)
-		{
-			PMChild    *bp = dlist_container(PMChild, elem, iter.cur);
-
-			/* We do NOT restart the syslogger */
-			if (bp == SysLoggerPMChild)
-				continue;
-
-			if (bp == StartupPMChild)
-				StartupStatus = STARTUP_SIGNALED;
-
-			/*
-			 * This backend is still alive.  Unless we did so already, tell it
-			 * to commit hara-kiri.
-			 *
-			 * We could exclude dead-end children here, but at least when
-			 * sending SIGABRT it seems better to include them.
-			 */
-			sigquit_child(bp);
-		}
-	}
+		TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
 
 	if (Shutdown != ImmediateShutdown)
 		FatalError = true;
@@ -3349,19 +3335,6 @@ signal_child(PMChild *pmchild, int signal)
 #endif
 }
 
-/*
- * Convenience function for killing a child process after a crash of some
- * other child process.  We apply send_abort_for_crash to decide which signal
- * to send.  Normally it's SIGQUIT -- and most other comments in this file are
- * written on the assumption that it is -- but developers might prefer to use
- * SIGABRT to collect per-child core dumps.
- */
-static void
-sigquit_child(PMChild *pmchild)
-{
-	signal_child(pmchild, (send_abort_for_crash ? SIGABRT : SIGQUIT));
-}
-
 /*
  * Send a signal to the targeted children.
  */
-- 
2.48.1.76.g4e746b1a31.dirty

>From 75c5eaa5c8a81885f8eba0b727e8cdeb85d84165 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Fri, 24 Jan 2025 11:40:50 -0500
Subject: [PATCH v5 3/7] postmaster: Don't repeatedly transition to crashing
 state

Previously HandleChildCrash() skipped logging and signalling child exits if
already in an immediate shutdown or FatalError, but still transitioned server
state in response to a crash. That's redundant.

In the other place we transition to FatalError, we do take care to not do so
when already in FatalError state.

To make it easier to combine different paths for entering FatalError state,
only do so once in HandleChildCrash().

Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/postmaster.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ea92fb56c80..a24c0385fe6 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2685,8 +2685,6 @@ CleanupBackend(PMChild *bp,
 static void
 HandleChildCrash(int pid, int exitstatus, const char *procname)
 {
-	bool		take_action;
-
 	/*
 	 * We only log messages and send signals if this is the first process
 	 * crash and we're not doing an immediate shutdown; otherwise, we're only
@@ -2694,15 +2692,13 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	 * signaled children, nonzero exit status is to be expected, so don't
 	 * clutter log.
 	 */
-	take_action = !FatalError && Shutdown != ImmediateShutdown;
+	if (FatalError || Shutdown == ImmediateShutdown)
+		return;
 
-	if (take_action)
-	{
-		LogChildExit(LOG, procname, pid, exitstatus);
-		ereport(LOG,
-				(errmsg("terminating any other active server processes")));
-		SetQuitSignalReason(PMQUIT_FOR_CRASH);
-	}
+	LogChildExit(LOG, procname, pid, exitstatus);
+	ereport(LOG,
+			(errmsg("terminating any other active server processes")));
+	SetQuitSignalReason(PMQUIT_FOR_CRASH);
 
 	/*
 	 * Signal all other child processes to exit.  The crashed process has
@@ -2711,11 +2707,9 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 	 * We could exclude dead-end children here, but at least when sending
 	 * SIGABRT it seems better to include them.
 	 */
-	if (take_action)
-		TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
+	TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
 
-	if (Shutdown != ImmediateShutdown)
-		FatalError = true;
+	FatalError = true;
 
 	/* We now transit into a state of waiting for children to die */
 	if (pmState == PM_RECOVERY ||
-- 
2.48.1.76.g4e746b1a31.dirty

>From 8b1ffdd4b4b14aced5d78c854202737f9d5b57ed Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Tue, 14 Jan 2025 00:57:12 -0500
Subject: [PATCH v5 4/7] postmaster: Move code to switch into FatalError state
 into function

There are two places switching to FatalError mode, behaving somewhat
differently. An upcoming commit will introduce a third. That doesn't seem seem
like a good idea.

This commit just moves the FatalError related code from HandleChildCrash()
into its own function, a subsequent commit will evolve the state machine
change to be suitable for other callers.

Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/postmaster.c | 74 ++++++++++++++++++++---------
 1 file changed, 51 insertions(+), 23 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a24c0385fe6..a92fe4240b2 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2674,40 +2674,35 @@ CleanupBackend(PMChild *bp,
 }
 
 /*
- * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
- * walwriter, autovacuum, archiver, slot sync worker, or background worker.
+ * Transition into FatalError state, in response to something bad having
+ * happened. Commonly the caller will have logged the reason for entering
+ * FatalError state.
  *
- * The objectives here are to clean up our local state about the child
- * process, and to signal all other remaining children to quickdie.
- *
- * The caller has already released its PMChild slot.
+ * This should only be called when not already in FatalError or
+ * ImmediateShutdown state.
  */
 static void
-HandleChildCrash(int pid, int exitstatus, const char *procname)
+HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
 {
-	/*
-	 * We only log messages and send signals if this is the first process
-	 * crash and we're not doing an immediate shutdown; otherwise, we're only
-	 * here to update postmaster's idea of live processes.  If we have already
-	 * signaled children, nonzero exit status is to be expected, so don't
-	 * clutter log.
-	 */
-	if (FatalError || Shutdown == ImmediateShutdown)
-		return;
+	int			sigtosend;
+
+	Assert(!FatalError);
+	Assert(Shutdown != ImmediateShutdown);
+
+	SetQuitSignalReason(reason);
 
-	LogChildExit(LOG, procname, pid, exitstatus);
-	ereport(LOG,
-			(errmsg("terminating any other active server processes")));
-	SetQuitSignalReason(PMQUIT_FOR_CRASH);
+	if (consider_sigabrt && send_abort_for_crash)
+		sigtosend = SIGABRT;
+	else
+		sigtosend = SIGQUIT;
 
 	/*
-	 * Signal all other child processes to exit.  The crashed process has
-	 * already been removed from ActiveChildList.
+	 * Signal all other child processes to exit.
 	 *
 	 * We could exclude dead-end children here, but at least when sending
 	 * SIGABRT it seems better to include them.
 	 */
-	TerminateChildren(send_abort_for_crash ? SIGABRT : SIGQUIT);
+	TerminateChildren(sigtosend);
 
 	FatalError = true;
 
@@ -2727,6 +2722,39 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
 		AbortStartTime = time(NULL);
 }
 
+/*
+ * HandleChildCrash -- cleanup after failed backend, bgwriter, checkpointer,
+ * walwriter, autovacuum, archiver, slot sync worker, or background worker.
+ *
+ * The objectives here are to clean up our local state about the child
+ * process, and to signal all other remaining children to quickdie.
+ *
+ * The caller has already released its PMChild slot.
+ */
+static void
+HandleChildCrash(int pid, int exitstatus, const char *procname)
+{
+	/*
+	 * We only log messages and send signals if this is the first process
+	 * crash and we're not doing an immediate shutdown; otherwise, we're only
+	 * here to update postmaster's idea of live processes.  If we have already
+	 * signaled children, nonzero exit status is to be expected, so don't
+	 * clutter log.
+	 */
+	if (FatalError || Shutdown == ImmediateShutdown)
+		return;
+
+	LogChildExit(LOG, procname, pid, exitstatus);
+	ereport(LOG,
+			(errmsg("terminating any other active server processes")));
+
+	/*
+	 * Switch into error state. The crashed process has already been removed
+	 * from ActiveChildList.
+	 */
+	HandleFatalError(PMQUIT_FOR_CRASH, true);
+}
+
 /*
  * Log the death of a child process.
  */
-- 
2.48.1.76.g4e746b1a31.dirty

>From 627fcf50e7e03e12dc1c61ec76c32fb48532796a Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Tue, 14 Jan 2025 00:25:01 -0500
Subject: [PATCH v5 5/7] postmaster: Commonalize FatalError paths

This includes some behavioral changes:

- Previously PM_WAIT_XLOG_ARCHIVAL wasn't handled in HandleFatalError(), that
  doesn't seem quite right.

- Previously a fatal error in PM_WAIT_XLOG_SHUTDOWN lead to jumping back to
  PM_WAIT_BACKENDS, no we go to PM_WAIT_DEAD_END. Jumping backwards doesn't
  seem quite right and we didn't do so when checkpointer failed to fork during
  a shutdown.

- Previously a checkpointer fork failure didn't call SetQuitSignalReason(),
  which would lead to quickdie() reporting
  "terminating connection because of unexpected SIGQUIT signal"
  which seems even worse than the PMQUIT_FOR_CRASH message. If I saw that in
  the log I'd suspect somebody outside of postgres sent SIGQUITs

Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/postmaster.c | 73 ++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 16 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index a92fe4240b2..95eb989355c 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2706,13 +2706,49 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
 
 	FatalError = true;
 
-	/* We now transit into a state of waiting for children to die */
-	if (pmState == PM_RECOVERY ||
-		pmState == PM_HOT_STANDBY ||
-		pmState == PM_RUN ||
-		pmState == PM_STOP_BACKENDS ||
-		pmState == PM_WAIT_XLOG_SHUTDOWN)
-		UpdatePMState(PM_WAIT_BACKENDS);
+	/*
+	 * Choose the appropriate new state to react to the fatal error. Unless we
+	 * were already in the process of shutting down, we go through
+	 * PM_WAIT_BACKEND. For errors during the shutdown sequence, we directly
+	 * switch to PM_WAIT_DEAD_END.
+	 */
+	switch (pmState)
+	{
+		case PM_INIT:
+			/* shouldn't have any children */
+			Assert(false);
+			break;
+		case PM_STARTUP:
+			/* should have been handled in process_pm_child_exit */
+			Assert(false);
+			break;
+
+			/* wait for children to die */
+		case PM_RECOVERY:
+		case PM_HOT_STANDBY:
+		case PM_RUN:
+		case PM_STOP_BACKENDS:
+			UpdatePMState(PM_WAIT_BACKENDS);
+			break;
+
+		case PM_WAIT_BACKENDS:
+			/* there might be more backends to wait for */
+			break;
+
+		case PM_WAIT_XLOG_SHUTDOWN:
+		case PM_WAIT_XLOG_ARCHIVAL:
+			/*
+			 * NB: Similar code exists in PostmasterStateMachine()'s handling
+			 * of FatalError in PM_STOP_BACKENDS/PM_WAIT_BACKENDS states.
+			 */
+			ConfigurePostmasterWaitSet(false);
+			UpdatePMState(PM_WAIT_DEAD_END);
+			break;
+
+		case PM_WAIT_DEAD_END:
+		case PM_NO_CHILDREN:
+			break;
+	}
 
 	/*
 	 * .. and if this doesn't happen quickly enough, now the clock is ticking
@@ -2942,15 +2978,18 @@ PostmasterStateMachine(void)
 			{
 				/*
 				 * Stop any dead-end children and stop creating new ones.
+				 *
+				 * NB: Similar code exists in HandleFatalErrors(), when the
+				 * error happens in pmState > PM_WAIT_BACKENDS.
 				 */
 				UpdatePMState(PM_WAIT_DEAD_END);
 				ConfigurePostmasterWaitSet(false);
 				SignalChildren(SIGQUIT, btmask(B_DEAD_END_BACKEND));
 
 				/*
-				 * We already SIGQUIT'd walsenders and the archiver, if any,
-				 * when we started immediate shutdown or entered FatalError
-				 * state.
+				 * We already SIGQUIT'd auxiliary processes (other than
+				 * logger), if any, when we started immediate shutdown or
+				 * entered FatalError state.
 				 */
 			}
 			else
@@ -2981,13 +3020,15 @@ PostmasterStateMachine(void)
 					 * We don't consult send_abort_for_crash here, as it's
 					 * unlikely that dumping cores would illuminate the reason
 					 * for checkpointer fork failure.
+					 *
+					 * XXX: It may be worth to introduce a different PMQUIT
+					 * value that signals that the cluster is in a bad state,
+					 * without a process having crashed. But right now this
+					 * path is very unlikely to be reached, so it isn't
+					 * obviously worthwhile adding a distinct error message in
+					 * quickdie().
 					 */
-					FatalError = true;
-					UpdatePMState(PM_WAIT_DEAD_END);
-					ConfigurePostmasterWaitSet(false);
-
-					/* Kill the walsenders and archiver too */
-					SignalChildren(SIGQUIT, btmask_all_except(B_LOGGER));
+					HandleFatalError(PMQUIT_FOR_CRASH, false);
 				}
 			}
 		}
-- 
2.48.1.76.g4e746b1a31.dirty

>From 810ac95b2ddcaea3b78ab1809cdcc8c522a27dc6 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Tue, 14 Jan 2025 00:31:03 -0500
Subject: [PATCH v5 6/7] postmaster: Adjust which processes we expect to have
 exited

Comments and code stated that we expect checkpointer to have been signalled in
case of immediate shutdown / fatal errors, but didn't treat archiver and
walsenders the same. That doesn't seem right.

I had started digging through the history to see where this oddity was
introduced, but it's not the fault of a single commit.

Instead treat archiver, checkpointer, and walsenders the same.

Reviewed-by: Bertrand Drouvot <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/backend/postmaster/postmaster.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 95eb989355c..37bb0ea4d7a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2918,16 +2918,20 @@ PostmasterStateMachine(void)
 
 		/*
 		 * If we are doing crash recovery or an immediate shutdown then we
-		 * expect the checkpointer to exit as well, otherwise not.
+		 * expect archiver, checkpointer and walsender to exit as well,
+		 * otherwise not.
 		 */
 		if (FatalError || Shutdown >= ImmediateShutdown)
-			targetMask = btmask_add(targetMask, B_CHECKPOINTER);
+			targetMask = btmask_add(targetMask,
+									B_CHECKPOINTER,
+									B_ARCHIVER,
+									B_WAL_SENDER);
 
 		/*
-		 * Walsenders and archiver will continue running; they will be
-		 * terminated later after writing the checkpoint record.  We also let
-		 * dead-end children to keep running for now.  The syslogger process
-		 * exits last.
+		 * Normally walsenders and archiver will continue running; they will
+		 * be terminated later after writing the checkpoint record.  We also
+		 * let dead-end children to keep running for now.  The syslogger
+		 * process exits last.
 		 *
 		 * This assertion checks that we have covered all backend types,
 		 * either by including them in targetMask, or by noting here that they
@@ -2938,13 +2942,17 @@ PostmasterStateMachine(void)
 			BackendTypeMask remainMask = BTYPE_MASK_NONE;
 
 			remainMask = btmask_add(remainMask,
-									B_WAL_SENDER,
-									B_ARCHIVER,
 									B_DEAD_END_BACKEND,
 									B_LOGGER);
 
-			/* checkpointer may or may not be in targetMask already */
-			remainMask = btmask_add(remainMask, B_CHECKPOINTER);
+			/*
+			 * Archiver, checkpointer and walsender may or may not be in
+			 * targetMask already.
+			 */
+			remainMask = btmask_add(remainMask,
+									B_ARCHIVER,
+									B_CHECKPOINTER,
+									B_WAL_SENDER);
 
 			/* these are not real postmaster children */
 			remainMask = btmask_add(remainMask,
-- 
2.48.1.76.g4e746b1a31.dirty

>From 3877e1dffa991254082aca07afe4e6d69e1b6ddc Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Fri, 24 Jan 2025 12:25:14 -0500
Subject: [PATCH v5 7/7] Change shutdown sequence to terminate checkpointer
 last

The main motivation for this change is to have a process that can serialize
stats after all other processes have terminated. Serializing stats already
happens in checkpointer, even though walsenders can be active longer.

The only reason the current shutdown sequence does not actively cause
problems, is that walsender currently does not generate any stats. However,
there is a patch to change that.

Another need for this change originates in the AIO patchset, where IO
workers (which, in some edge cases, can emit stats of their own) need to run
while the shutdown checkpoint is being written.

This commit changes the shutdown sequence so checkpointer is signalled (via
SIGINT) to trigger writing the shutdown checkpoint without terminating
it. Once checkpointer wrote the checkpoint it will wait for a termination
signal (SIGUSR2, as before).

Postmaster now triggers the shutdown checkpoint via SIGINT, where we
previously did so by terminating checkpointer. Checkpointer now is terminated
after all children other than dead-end children and logger have been
terminated, tracked using the new PM_WAIT_CHECKPOINTER PMState.

Reviewed-by: Heikki Linnakangas <[email protected]>
Reviewed-by: Bertrand Drouvot <[email protected]>
Reviewed-by: Nazir Bilal Yavuz <[email protected]>
Discussion: https://postgr.es/m/kgng5nrvnlv335evmsuvpnh354rw7qyazl73kdysev2cr2v5zu@m3cfzxicm5kp
---
 src/include/storage/pmsignal.h                |   3 +-
 src/backend/postmaster/checkpointer.c         | 127 ++++++++++++----
 src/backend/postmaster/postmaster.c           | 140 +++++++++++++-----
 .../utils/activity/wait_event_names.txt       |   1 +
 4 files changed, 202 insertions(+), 69 deletions(-)

diff --git a/src/include/storage/pmsignal.h b/src/include/storage/pmsignal.h
index 3fbe5bf1136..d84a383047e 100644
--- a/src/include/storage/pmsignal.h
+++ b/src/include/storage/pmsignal.h
@@ -40,9 +40,10 @@ typedef enum
 	PMSIGNAL_BACKGROUND_WORKER_CHANGE,	/* background worker state change */
 	PMSIGNAL_START_WALRECEIVER, /* start a walreceiver */
 	PMSIGNAL_ADVANCE_STATE_MACHINE, /* advance postmaster's state machine */
+	PMSIGNAL_XLOG_IS_SHUTDOWN,	/* ShutdownXLOG() completed */
 } PMSignalReason;
 
-#define NUM_PMSIGNALS (PMSIGNAL_ADVANCE_STATE_MACHINE+1)
+#define NUM_PMSIGNALS (PMSIGNAL_XLOG_IS_SHUTDOWN+1)
 
 /*
  * Reasons why the postmaster would send SIGQUIT to its children.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index dd2c8376c6e..b94f9cdff21 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -10,10 +10,13 @@
  * fill WAL segments; the checkpointer itself doesn't watch for the
  * condition.)
  *
- * Normal termination is by SIGUSR2, which instructs the checkpointer to
- * execute a shutdown checkpoint and then exit(0).  (All backends must be
- * stopped before SIGUSR2 is issued!)  Emergency termination is by SIGQUIT;
- * like any backend, the checkpointer will simply abort and exit on SIGQUIT.
+ * The normal termination sequence is that checkpointer is instructed to
+ * execute the shutdown checkpoint by SIGINT.  After that checkpointer waits
+ * to be terminated via SIGUSR2, which instructs the checkpointer to exit(0).
+ * All backends must be stopped before SIGINT or SIGUSR2 is issued!
+ *
+ * Emergency termination is by SIGQUIT; like any backend, the checkpointer
+ * will simply abort and exit on SIGQUIT.
  *
  * If the checkpointer exits unexpectedly, the postmaster treats that the same
  * as a backend crash: shared memory may be corrupted, so remaining backends
@@ -51,6 +54,7 @@
 #include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/lwlock.h"
+#include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "storage/procsignal.h"
 #include "storage/shmem.h"
@@ -141,6 +145,7 @@ double		CheckPointCompletionTarget = 0.9;
  * Private state
  */
 static bool ckpt_active = false;
+static volatile sig_atomic_t ShutdownXLOGPending = false;
 
 /* these values are valid when ckpt_active is true: */
 static pg_time_t ckpt_start_time;
@@ -159,6 +164,9 @@ static bool ImmediateCheckpointRequested(void);
 static bool CompactCheckpointerRequestQueue(void);
 static void UpdateSharedMemoryConfig(void);
 
+/* Signal handlers */
+static void ReqShutdownXLOG(SIGNAL_ARGS);
+
 
 /*
  * Main entry point for checkpointer process
@@ -188,7 +196,7 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 	 * tell us it's okay to shut down (via SIGUSR2).
 	 */
 	pqsignal(SIGHUP, SignalHandlerForConfigReload);
-	pqsignal(SIGINT, SIG_IGN);
+	pqsignal(SIGINT, ReqShutdownXLOG);
 	pqsignal(SIGTERM, SIG_IGN); /* ignore SIGTERM */
 	/* SIGQUIT handler was already set up by InitPostmasterChild */
 	pqsignal(SIGALRM, SIG_IGN);
@@ -211,8 +219,11 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 	 * process during a normal shutdown, and since checkpointer is shut down
 	 * very late...
 	 *
-	 * Walsenders are shut down after the checkpointer, but currently don't
-	 * report stats. If that changes, we need a more complicated solution.
+	 * While e.g. walsenders are active after the shutdown checkpoint has been
+	 * written (and thus could produce more stats), checkpointer stays around
+	 * after the shutdown checkpoint has been written. postmaster will only
+	 * signal checkpointer to exit after all processes that could emit stats
+	 * have been shut down.
 	 */
 	before_shmem_exit(pgstat_before_server_shutdown, 0);
 
@@ -327,7 +338,8 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 	ProcGlobal->checkpointerProc = MyProcNumber;
 
 	/*
-	 * Loop forever
+	 * Loop until we've been asked to write the shutdown checkpoint or
+	 * terminate.
 	 */
 	for (;;)
 	{
@@ -346,7 +358,10 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 		 * Process any requests or signals received recently.
 		 */
 		AbsorbSyncRequests();
+
 		HandleCheckpointerInterrupts();
+		if (ShutdownXLOGPending || ShutdownRequestPending)
+			break;
 
 		/*
 		 * Detect a pending checkpoint request by checking whether the flags
@@ -517,8 +532,13 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 
 			ckpt_active = false;
 
-			/* We may have received an interrupt during the checkpoint. */
+			/*
+			 * We may have received an interrupt during the checkpoint and the
+			 * latch might have been reset (e.g. in CheckpointWriteDelay).
+			 */
 			HandleCheckpointerInterrupts();
+			if (ShutdownXLOGPending || ShutdownRequestPending)
+				break;
 		}
 
 		/* Check for archive_timeout and switch xlog files if necessary. */
@@ -557,6 +577,57 @@ CheckpointerMain(char *startup_data, size_t startup_data_len)
 						 cur_timeout * 1000L /* convert to ms */ ,
 						 WAIT_EVENT_CHECKPOINTER_MAIN);
 	}
+
+	/*
+	 * From here on, elog(ERROR) should end with exit(1), not send control
+	 * back to the sigsetjmp block above.
+	 */
+	ExitOnAnyError = true;
+
+	if (ShutdownXLOGPending)
+	{
+		/*
+		 * Close down the database.
+		 *
+		 * Since ShutdownXLOG() creates restartpoint or checkpoint, and
+		 * updates the statistics, increment the checkpoint request and flush
+		 * out pending statistic.
+		 */
+		PendingCheckpointerStats.num_requested++;
+		ShutdownXLOG(0, 0);
+		pgstat_report_checkpointer();
+		pgstat_report_wal(true);
+
+		/*
+		 * Tell postmaster that we're done.
+		 */
+		SendPostmasterSignal(PMSIGNAL_XLOG_IS_SHUTDOWN);
+		ShutdownXLOGPending = false;
+	}
+
+	/*
+	 * Wait until we're asked to shut down. By separating the writing of the
+	 * shutdown checkpoint from checkpointer exiting, checkpointer can perform
+	 * some should-be-as-late-as-possible work like writing out stats.
+	 */
+	for (;;)
+	{
+		/* Clear any already-pending wakeups */
+		ResetLatch(MyLatch);
+
+		HandleCheckpointerInterrupts();
+
+		if (ShutdownRequestPending)
+			break;
+
+		(void) WaitLatch(MyLatch,
+						 WL_LATCH_SET | WL_EXIT_ON_PM_DEATH,
+						 0,
+						 WAIT_EVENT_CHECKPOINTER_SHUTDOWN);
+	}
+
+	/* Normal exit from the checkpointer is here */
+	proc_exit(0);				/* done */
 }
 
 /*
@@ -586,29 +657,6 @@ HandleCheckpointerInterrupts(void)
 		 */
 		UpdateSharedMemoryConfig();
 	}
-	if (ShutdownRequestPending)
-	{
-		/*
-		 * From here on, elog(ERROR) should end with exit(1), not send control
-		 * back to the sigsetjmp block above
-		 */
-		ExitOnAnyError = true;
-
-		/*
-		 * Close down the database.
-		 *
-		 * Since ShutdownXLOG() creates restartpoint or checkpoint, and
-		 * updates the statistics, increment the checkpoint request and flush
-		 * out pending statistic.
-		 */
-		PendingCheckpointerStats.num_requested++;
-		ShutdownXLOG(0, 0);
-		pgstat_report_checkpointer();
-		pgstat_report_wal(true);
-
-		/* Normal exit from the checkpointer is here */
-		proc_exit(0);			/* done */
-	}
 
 	/* Perform logging of memory contexts of this process */
 	if (LogMemoryContextPending)
@@ -729,6 +777,7 @@ CheckpointWriteDelay(int flags, double progress)
 	 * in which case we just try to catch up as quickly as possible.
 	 */
 	if (!(flags & CHECKPOINT_IMMEDIATE) &&
+		!ShutdownXLOGPending &&
 		!ShutdownRequestPending &&
 		!ImmediateCheckpointRequested() &&
 		IsCheckpointOnSchedule(progress))
@@ -857,6 +906,20 @@ IsCheckpointOnSchedule(double progress)
 }
 
 
+/* --------------------------------
+ *		signal handler routines
+ * --------------------------------
+ */
+
+/* SIGINT: set flag to trigger writing of shutdown checkpoint */
+static void
+ReqShutdownXLOG(SIGNAL_ARGS)
+{
+	ShutdownXLOGPending = true;
+	SetLatch(MyLatch);
+}
+
+
 /* --------------------------------
  *		communication with backends
  * --------------------------------
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 37bb0ea4d7a..bb22b13adef 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -341,6 +341,7 @@ typedef enum
 								 * ckpt */
 	PM_WAIT_XLOG_ARCHIVAL,		/* waiting for archiver and walsenders to
 								 * finish */
+	PM_WAIT_CHECKPOINTER,		/* waiting for checkpointer to shut down */
 	PM_WAIT_DEAD_END,			/* waiting for dead-end children to exit */
 	PM_NO_CHILDREN,				/* all important children have exited */
 } PMState;
@@ -2363,35 +2364,20 @@ process_pm_child_exit(void)
 		{
 			ReleasePostmasterChildSlot(CheckpointerPMChild);
 			CheckpointerPMChild = NULL;
-			if (EXIT_STATUS_0(exitstatus) && pmState == PM_WAIT_XLOG_SHUTDOWN)
+			if (EXIT_STATUS_0(exitstatus) && pmState == PM_WAIT_CHECKPOINTER)
 			{
 				/*
 				 * OK, we saw normal exit of the checkpointer after it's been
-				 * told to shut down.  We expect that it wrote a shutdown
-				 * checkpoint.  (If for some reason it didn't, recovery will
-				 * occur on next postmaster start.)
+				 * told to shut down.  We know checkpointer wrote a shutdown
+				 * checkpoint, otherwise we'd still be in
+				 * PM_WAIT_XLOG_SHUTDOWN state.
 				 *
-				 * At this point we should have no normal backend children
-				 * left (else we'd not be in PM_WAIT_XLOG_SHUTDOWN state) but
-				 * we might have dead-end children to wait for.
-				 *
-				 * If we have an archiver subprocess, tell it to do a last
-				 * archive cycle and quit. Likewise, if we have walsender
-				 * processes, tell them to send any remaining WAL and quit.
-				 */
-				Assert(Shutdown > NoShutdown);
-
-				/* Waken archiver for the last time */
-				if (PgArchPMChild != NULL)
-					signal_child(PgArchPMChild, SIGUSR2);
-
-				/*
-				 * Waken walsenders for the last time. No regular backends
-				 * should be around anymore.
+				 * At this point only dead-end children and logger should be
+				 * left.
 				 */
-				SignalChildren(SIGUSR2, btmask(B_WAL_SENDER));
-
-				UpdatePMState(PM_WAIT_XLOG_ARCHIVAL);
+				UpdatePMState(PM_WAIT_DEAD_END);
+				ConfigurePostmasterWaitSet(false);
+				SignalChildren(SIGTERM, btmask_all_except(B_LOGGER));
 			}
 			else
 			{
@@ -2737,6 +2723,8 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt)
 
 		case PM_WAIT_XLOG_SHUTDOWN:
 		case PM_WAIT_XLOG_ARCHIVAL:
+		case PM_WAIT_CHECKPOINTER:
+
 			/*
 			 * NB: Similar code exists in PostmasterStateMachine()'s handling
 			 * of FatalError in PM_STOP_BACKENDS/PM_WAIT_BACKENDS states.
@@ -3011,10 +2999,10 @@ PostmasterStateMachine(void)
 				/* Start the checkpointer if not running */
 				if (CheckpointerPMChild == NULL)
 					CheckpointerPMChild = StartChildProcess(B_CHECKPOINTER);
-				/* And tell it to shut down */
+				/* And tell it to write the shutdown checkpoint */
 				if (CheckpointerPMChild != NULL)
 				{
-					signal_child(CheckpointerPMChild, SIGUSR2);
+					signal_child(CheckpointerPMChild, SIGINT);
 					UpdatePMState(PM_WAIT_XLOG_SHUTDOWN);
 				}
 				else
@@ -3042,22 +3030,40 @@ PostmasterStateMachine(void)
 		}
 	}
 
+	/*
+	 * The state transition from PM_WAIT_XLOG_SHUTDOWN to
+	 * PM_WAIT_XLOG_ARCHIVAL is in process_pm_pmsignal(), in response to
+	 * PMSIGNAL_XLOG_IS_SHUTDOWN.
+	 */
+
 	if (pmState == PM_WAIT_XLOG_ARCHIVAL)
 	{
 		/*
-		 * PM_WAIT_XLOG_ARCHIVAL state ends when there's no other children
-		 * than dead-end children left. There shouldn't be any regular
-		 * backends left by now anyway; what we're really waiting for is
-		 * walsenders and archiver.
+		 * PM_WAIT_XLOG_ARCHIVAL state ends when there are no children other
+		 * than checkpointer, dead-end children and logger left. There
+		 * shouldn't be any regular backends left by now anyway; what we're
+		 * really waiting for is for walsenders and archiver to exit.
 		 */
-		if (CountChildren(btmask_all_except(B_LOGGER, B_DEAD_END_BACKEND)) == 0)
+		if (CountChildren(btmask_all_except(B_CHECKPOINTER, B_LOGGER, B_DEAD_END_BACKEND)) == 0)
 		{
-			UpdatePMState(PM_WAIT_DEAD_END);
-			ConfigurePostmasterWaitSet(false);
-			SignalChildren(SIGTERM, btmask_all_except(B_LOGGER));
+			UpdatePMState(PM_WAIT_CHECKPOINTER);
+
+			/*
+			 * Now that the processes mentioned above are gone, tell
+			 * checkpointer to shut down too. That allows checkpointer to
+			 * perform some last bits of cleanup without other processes
+			 * interfering.
+			 */
+			if (CheckpointerPMChild != NULL)
+				signal_child(CheckpointerPMChild, SIGUSR2);
 		}
 	}
 
+	/*
+	 * The state transition from PM_WAIT_CHECKPOINTER to PM_WAIT_DEAD_END is
+	 * in process_pm_child_exit().
+	 */
+
 	if (pmState == PM_WAIT_DEAD_END)
 	{
 		/*
@@ -3194,6 +3200,7 @@ pmstate_name(PMState state)
 			PM_TOSTR_CASE(PM_WAIT_XLOG_SHUTDOWN);
 			PM_TOSTR_CASE(PM_WAIT_XLOG_ARCHIVAL);
 			PM_TOSTR_CASE(PM_WAIT_DEAD_END);
+			PM_TOSTR_CASE(PM_WAIT_CHECKPOINTER);
 			PM_TOSTR_CASE(PM_NO_CHILDREN);
 	}
 #undef PM_TOSTR_CASE
@@ -3612,6 +3619,8 @@ ExitPostmaster(int status)
 static void
 process_pm_pmsignal(void)
 {
+	bool		request_state_update = false;
+
 	pending_pm_pmsignal = false;
 
 	ereport(DEBUG2,
@@ -3723,9 +3732,67 @@ process_pm_pmsignal(void)
 		WalReceiverRequested = true;
 	}
 
+	if (CheckPostmasterSignal(PMSIGNAL_XLOG_IS_SHUTDOWN))
+	{
+		/* Checkpointer completed the shutdown checkpoint */
+		if (pmState == PM_WAIT_XLOG_SHUTDOWN)
+		{
+			/*
+			 * If we have an archiver subprocess, tell it to do a last archive
+			 * cycle and quit. Likewise, if we have walsender processes, tell
+			 * them to send any remaining WAL and quit.
+			 */
+			Assert(Shutdown > NoShutdown);
+
+			/* Waken archiver for the last time */
+			if (PgArchPMChild != NULL)
+				signal_child(PgArchPMChild, SIGUSR2);
+
+			/*
+			 * Waken walsenders for the last time. No regular backends should
+			 * be around anymore.
+			 */
+			SignalChildren(SIGUSR2, btmask(B_WAL_SENDER));
+
+			UpdatePMState(PM_WAIT_XLOG_ARCHIVAL);
+		}
+		else if (!FatalError && Shutdown != ImmediateShutdown)
+		{
+			/*
+			 * Checkpointer only ought to perform the shutdown checkpoint
+			 * during shutdown.  If somehow checkpointer did so in another
+			 * situation, we have no choice but to crash-restart.
+			 *
+			 * It's possible however that we get PMSIGNAL_XLOG_IS_SHUTDOWN
+			 * outside of PM_WAIT_XLOG_SHUTDOWN if an orderly shutdown was
+			 * "interrupted" by a crash or an immediate shutdown.
+			 */
+			ereport(LOG,
+					(errmsg("WAL was shut down unexpectedly")));
+
+			/*
+			 * Doesn't seem likely to help to take send_abort_for_crash into
+			 * account here.
+			 */
+			HandleFatalError(PMQUIT_FOR_CRASH, false);
+		}
+
+		/*
+		 * Need to run PostmasterStateMachine() to check if we already can go
+		 * to the next state.
+		 */
+		request_state_update = true;
+	}
+
 	/*
 	 * Try to advance postmaster's state machine, if a child requests it.
-	 *
+	 */
+	if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE))
+	{
+		request_state_update = true;
+	}
+
+	/*
 	 * Be careful about the order of this action relative to this function's
 	 * other actions.  Generally, this should be after other actions, in case
 	 * they have effects PostmasterStateMachine would need to know about.
@@ -3733,7 +3800,7 @@ process_pm_pmsignal(void)
 	 * cannot have any (immediate) effect on the state machine, but does
 	 * depend on what state we're in now.
 	 */
-	if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE))
+	if (request_state_update)
 	{
 		PostmasterStateMachine();
 	}
@@ -4044,6 +4111,7 @@ bgworker_should_start_now(BgWorkerStartTime start_time)
 	switch (pmState)
 	{
 		case PM_NO_CHILDREN:
+		case PM_WAIT_CHECKPOINTER:
 		case PM_WAIT_DEAD_END:
 		case PM_WAIT_XLOG_ARCHIVAL:
 		case PM_WAIT_XLOG_SHUTDOWN:
diff --git a/src/backend/utils/activity/wait_event_names.txt b/src/backend/utils/activity/wait_event_names.txt
index 0b53cba807d..e199f071628 100644
--- a/src/backend/utils/activity/wait_event_names.txt
+++ b/src/backend/utils/activity/wait_event_names.txt
@@ -56,6 +56,7 @@ AUTOVACUUM_MAIN	"Waiting in main loop of autovacuum launcher process."
 BGWRITER_HIBERNATE	"Waiting in background writer process, hibernating."
 BGWRITER_MAIN	"Waiting in main loop of background writer process."
 CHECKPOINTER_MAIN	"Waiting in main loop of checkpointer process."
+CHECKPOINTER_SHUTDOWN	"Waiting for checkpointer process to be terminated."
 LOGICAL_APPLY_MAIN	"Waiting in main loop of logical replication apply process."
 LOGICAL_LAUNCHER_MAIN	"Waiting in main loop of logical replication launcher process."
 LOGICAL_PARALLEL_APPLY_MAIN	"Waiting in main loop of logical replication parallel apply process."
-- 
2.48.1.76.g4e746b1a31.dirty

Re: Reorder shutdown sequence, to flush pgstats later

Reply via email to