Background writer and checkpointer in crash recovery

Thomas Munro Sat, 29 Aug 2020 17:13:37 -0700

(Forking this thread from the SLRU fsync one[1] to allow for a
separate CF entry; it's unrelated, except for being another case of
moving work off the recovery process's plate.)


Hello hackers,

Currently we don't run the bgwriter process during crash recovery.
I've CCed Simon and Heikki who established this in commit cdd46c76.
Based on that commit message, I think the bar to clear to change the
policy is to show that it's useful, and that it doesn't make crash
recovery less robust.   See the other thread for some initial evidence
of usefulness from Jakub Wartak.  I think it also just makes intuitive
sense that it's got to help bigger-than-buffer-pool crash recovery if
you can shift buffer eviction out of the recovery loop.   As for
robustness, I suppose we could provide the option to turn it off just
in case that turns out to be useful in some scenarios, but I'm
wondering why we would expect something that we routinely run in
archive/streaming recovery to reduce robustness in only slightly
different circumstances.

Here's an experiment-grade patch, comments welcome, though at this
stage it's primarily thoughts about the concept that I'm hoping to
solicit.

One question raised by Jakub that I don't have a great answer to right
now is whether you'd want different bgwriter settings in this scenario
for best results.  I don't know, but I suspect that the answer is to
make bgwriter more adaptive rather than more configurable, and that's
an orthogonal project.

Once we had the checkpointer running, we could also consider making
the end-of-recovery checkpoint optional, or at least the wait for it
to complete.  I haven't shown that in this patch, but it's just
different checkpoint request flags and possibly an end-of-recovery
record.  What problems do you foresee with that?  Why should we have
"fast" promotion but not "fast" crash recovery?

[1] 
https://www.postgresql.org/message-id/flat/CA+hUKGLJ=84yt+nvhkeedauutvhmfq9i-n7k_o50jmq6rpj...@mail.gmail.com

From 21d7d459a6076f85c743b739f1c4ba16451b7046 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Wed, 26 Aug 2020 16:34:33 +1200
Subject: [PATCH 1/2] Run checkpointer and bgworker in crash recovery.

Relieve the startup process of some writeback and checkpointing work
during crash recovery, making it more like replication recovery.  This
wasn't done back in commit cdd46c76 because it wasn't thought to be
useful at the time.  The theory of this patch is that there may be
workloads where we can profitably move a bunch of buffer eviction work
out of the recovery process.

In order to have a bgwriter running, you also need a checkpointer.
While you could give startup and bgwriter their own backend private
pending ops table, it's nicer to merger their requests in one place.

XXX This is just an experimental prototype patch and may contain all
kinds of bugs!
---
 src/backend/access/transam/xlog.c     | 33 ++++++++++-----------------
 src/backend/postmaster/bgwriter.c     |  3 ---
 src/backend/postmaster/checkpointer.c |  3 ---
 src/backend/postmaster/postmaster.c   | 17 ++++++--------
 src/backend/storage/sync/sync.c       | 30 +++---------------------
 src/include/storage/sync.h            |  1 -
 6 files changed, 22 insertions(+), 65 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 09c01ed4ae..d8080ed4f5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -870,9 +870,6 @@ bool		reachedConsistency = false;
 
 static bool InRedo = false;
 
-/* Have we launched bgwriter during recovery? */
-static bool bgwriterLaunched = false;
-
 /* For WALInsertLockAcquire/Release functions */
 static int	MyLockNo = 0;
 static bool holdingAllLocks = false;
@@ -7123,25 +7120,14 @@ StartupXLOG(void)
 		/* Also ensure XLogReceiptTime has a sane value */
 		XLogReceiptTime = GetCurrentTimestamp();
 
+		PublishStartupProcessInformation();
+
 		/*
 		 * Let postmaster know we've started redo now, so that it can launch
-		 * checkpointer to perform restartpoints.  We don't bother during
-		 * crash recovery as restartpoints can only be performed during
-		 * archive recovery.  And we'd like to keep crash recovery simple, to
-		 * avoid introducing bugs that could affect you when recovering after
-		 * crash.
-		 *
-		 * After this point, we can no longer assume that we're the only
-		 * process in addition to postmaster!  Also, fsync requests are
-		 * subsequently to be handled by the checkpointer, not locally.
+		 * the archiver if necessary.
 		 */
-		if (ArchiveRecoveryRequested && IsUnderPostmaster)
-		{
-			PublishStartupProcessInformation();
-			EnableSyncRequestForwarding();
+		if (IsUnderPostmaster)
 			SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
-			bgwriterLaunched = true;
-		}
 
 		/*
 		 * Allow read-only connections immediately if we're consistent
@@ -7729,7 +7715,7 @@ StartupXLOG(void)
 		 * after we're fully out of recovery mode and already accepting
 		 * queries.
 		 */
-		if (bgwriterLaunched)
+		if (ArchiveRecoveryRequested && IsUnderPostmaster)
 		{
 			if (LocalPromoteIsTriggered)
 			{
@@ -7764,8 +7750,13 @@ StartupXLOG(void)
 								  CHECKPOINT_IMMEDIATE |
 								  CHECKPOINT_WAIT);
 		}
+		else if (IsUnderPostmaster)
+			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+							  CHECKPOINT_IMMEDIATE |
+							  CHECKPOINT_WAIT);
 		else
-			CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
+			CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY |
+							 CHECKPOINT_IMMEDIATE);
 	}
 
 	if (ArchiveRecoveryRequested)
@@ -11891,7 +11882,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 		 * Request a restartpoint if we've replayed too much xlog since the
 		 * last one.
 		 */
-		if (bgwriterLaunched)
+		if (ArchiveRecoveryRequested && IsUnderPostmaster)
 		{
 			if (XLogCheckpointNeeded(readSegNo))
 			{
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 069e27e427..56de0bee18 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -12,9 +12,6 @@
  *
  * As of Postgres 9.2 the bgwriter no longer handles checkpoints.
  *
- * The bgwriter is started by the postmaster as soon as the startup subprocess
- * finishes, or as soon as recovery begins if we are doing archive recovery.
- * It remains alive until the postmaster commands it to terminate.
  * Normal termination is by SIGTERM, which instructs the bgwriter to exit(0).
  * Emergency termination is by SIGQUIT; like any backend, the bgwriter will
  * simply abort and exit on SIGQUIT.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 624a3238b8..883a0df198 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -10,9 +10,6 @@
  * fill WAL segments; the checkpointer itself doesn't watch for the
  * condition.)
  *
- * The checkpointer is started by the postmaster as soon as the startup
- * subprocess finishes, or as soon as recovery begins if we are doing archive
- * recovery.  It remains alive until the postmaster commands it to terminate.
  * Normal termination is by SIGUSR2, which instructs the checkpointer to
  * execute a shutdown checkpoint and then exit(0).  (All backends must be
  * stopped before SIGUSR2 is issued!)  Emergency termination is by SIGQUIT;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 42223c0f61..47cf1c11f7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1400,6 +1400,12 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	/* Start bgwriter and checkpointer so they can help with recovery */
+	if (CheckpointerPID == 0)
+		CheckpointerPID = StartCheckpointer();
+	if (BgWriterPID == 0)
+		BgWriterPID = StartBackgroundWriter();
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -1761,7 +1767,7 @@ ServerLoop(void)
 		 * fails, we'll just try again later.  Likewise for the checkpointer.
 		 */
 		if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-			pmState == PM_HOT_STANDBY)
+			pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 		{
 			if (CheckpointerPID == 0)
 				CheckpointerPID = StartCheckpointer();
@@ -5208,15 +5214,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		FatalError = false;
 		AbortStartTime = 0;
 
-		/*
-		 * Crank up the background tasks.  It doesn't matter if this fails,
-		 * we'll just try again later.
-		 */
-		Assert(CheckpointerPID == 0);
-		CheckpointerPID = StartCheckpointer();
-		Assert(BgWriterPID == 0);
-		BgWriterPID = StartBackgroundWriter();
-
 		/*
 		 * Start the archiver if we're responsible for (re-)archiving received
 		 * files.
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 3ded2cdd71..27c28b8562 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -107,10 +107,10 @@ InitSync(void)
 {
 	/*
 	 * Create pending-operations hashtable if we need it.  Currently, we need
-	 * it if we are standalone (not under a postmaster) or if we are a startup
-	 * or checkpointer auxiliary process.
+	 * it if we are standalone (not under a postmaster) or if we are a
+	 * checkpointer auxiliary process.
 	 */
-	if (!IsUnderPostmaster || AmStartupProcess() || AmCheckpointerProcess())
+	if (!IsUnderPostmaster || AmCheckpointerProcess())
 	{
 		HASHCTL		hash_ctl;
 
@@ -568,27 +568,3 @@ RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
 
 	return ret;
 }
-
-/*
- * In archive recovery, we rely on checkpointer to do fsyncs, but we will have
- * already created the pendingOps during initialization of the startup
- * process.  Calling this function drops the local pendingOps so that
- * subsequent requests will be forwarded to checkpointer.
- */
-void
-EnableSyncRequestForwarding(void)
-{
-	/* Perform any pending fsyncs we may have queued up, then drop table */
-	if (pendingOps)
-	{
-		ProcessSyncRequests();
-		hash_destroy(pendingOps);
-	}
-	pendingOps = NULL;
-
-	/*
-	 * We should not have any pending unlink requests, since mdunlink doesn't
-	 * queue unlink requests when isRedo.
-	 */
-	Assert(pendingUnlinks == NIL);
-}
diff --git a/src/include/storage/sync.h b/src/include/storage/sync.h
index e16ab8e711..9622854c71 100644
--- a/src/include/storage/sync.h
+++ b/src/include/storage/sync.h
@@ -55,7 +55,6 @@ extern void SyncPreCheckpoint(void);
 extern void SyncPostCheckpoint(void);
 extern void ProcessSyncRequests(void);
 extern void RememberSyncRequest(const FileTag *ftag, SyncRequestType type);
-extern void EnableSyncRequestForwarding(void);
 extern bool RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
 								bool retryOnError);
 
-- 
2.20.1

Background writer and checkpointer in crash recovery

Reply via email to