Re: Background writer and checkpointer in crash recovery

Thomas Munro Fri, 12 Mar 2021 14:16:56 -0800

On Wed, Feb 3, 2021 at 11:11 AM Robert Haas <[email protected]> wrote:
> I think the way it works right now is stupid and the proposed change
> is going in the right direction. We have ample evidence already that
> handing off fsyncs to a background process is a good idea, and there's
> no reason why that shouldn't be beneficial during crash recovery just
> as it is at other times. But even if it somehow failed to improve
> performance during recovery, there's another good reason to do this,
> which is that it would make the code simpler. Having the pendingOps
> stuff in the startup process in some recovery situations and in the
> checkpointer in other recovery situations makes this harder to reason
> about. As Tom said, the system state where bgwriter and checkpointer
> are not running is an uncommon one, and is probably more likely to
> have (or grow) bugs than the state where they are running.


Yeah, it's a good argument.

> The rat's-nest of logic introduced by the comment "Perform a
> checkpoint to update all our recovery activity to disk." inside
> StartupXLOG() could really do with some simplification. Right now we
> have three cases: CreateEndOfRecoveryRecord(), RequestCheckpoint(),
> and CreateCheckpoint(). Maybe with this change we could get it down to
> just two, since RequestCheckpoint() already knows what to do about
> !IsUnderPostmaster.

True.  Done in this version.

Here's a rebase of this patch + Simon's patch to report on stats.

I also have a sketch patch to provide a GUC that turns off the
end-of-recovery checkpoint as mentioned earlier, attached (sharing
mainly because this is one of the stack of patches that Jakub was
testing for his baseback/PITR workloads and he might want to test some
more), but I'm not proposing that for PG14.  That idea is tangled up
with the "relfile tombstone" stuff I wrote about elsewhere[1], but I
haven't finished studying the full arsenal of footguns in that area
(it's something like: if we don't wait for end-of-recovery checkpoint
before allowing connections, then we'd better start creating
tombstones in recovery unless the WAL level is high enough to avoid
data eating hazards with unlogged changes and a double crash).

[1] https://commitfest.postgresql.org/33/3030/

From 7a82c9e622ea45c981d05dff08c1d4279ec3c73b Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Wed, 26 Aug 2020 16:34:33 +1200
Subject: [PATCH v2 1/3] Run checkpointer and bgworker in crash recovery.

Relieve the startup process of some writeback and checkpointing work
during crash recovery, making it more like replication recovery.  This
wasn't done back in commit cdd46c76 out of caution, but now it seems
more conservative to have the crash recovery environment work the same
as the more commonly exercised streaming replication environment.

In order to have a bgwriter running, you also need a checkpointer.
While you could give startup and bgwriter their own backend private
pending ops table, it's nicer to merger their requests in one place.

Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Simon Riggs <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Tested-by:  Jakub Wartak <[email protected]>
Discussion: https://postgr.es/m/CA%2BhUKGJ8NRsqgkZEnsnRc2MFROBV-jCnacbYvtpptK2A9YYp9Q%40mail.gmail.com
---
 src/backend/access/transam/xlog.c     | 32 +++++++++------------------
 src/backend/postmaster/bgwriter.c     |  3 ---
 src/backend/postmaster/checkpointer.c |  3 ---
 src/backend/postmaster/postmaster.c   | 17 ++++++--------
 src/backend/storage/sync/sync.c       | 30 +++----------------------
 src/include/storage/sync.h            |  1 -
 6 files changed, 21 insertions(+), 65 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f4d1ce5dea..6e8e6cf1e4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -880,9 +880,6 @@ bool		reachedConsistency = false;
 
 static bool InRedo = false;
 
-/* Have we launched bgwriter during recovery? */
-static bool bgwriterLaunched = false;
-
 /* For WALInsertLockAcquire/Release functions */
 static int	MyLockNo = 0;
 static bool holdingAllLocks = false;
@@ -7268,25 +7265,14 @@ StartupXLOG(void)
 		/* Also ensure XLogReceiptTime has a sane value */
 		XLogReceiptTime = GetCurrentTimestamp();
 
+		PublishStartupProcessInformation();
+
 		/*
 		 * Let postmaster know we've started redo now, so that it can launch
-		 * checkpointer to perform restartpoints.  We don't bother during
-		 * crash recovery as restartpoints can only be performed during
-		 * archive recovery.  And we'd like to keep crash recovery simple, to
-		 * avoid introducing bugs that could affect you when recovering after
-		 * crash.
-		 *
-		 * After this point, we can no longer assume that we're the only
-		 * process in addition to postmaster!  Also, fsync requests are
-		 * subsequently to be handled by the checkpointer, not locally.
+		 * the archiver if necessary.
 		 */
-		if (ArchiveRecoveryRequested && IsUnderPostmaster)
-		{
-			PublishStartupProcessInformation();
-			EnableSyncRequestForwarding();
+		if (IsUnderPostmaster)
 			SendPostmasterSignal(PMSIGNAL_RECOVERY_STARTED);
-			bgwriterLaunched = true;
-		}
 
 		/*
 		 * Allow read-only connections immediately if we're consistent
@@ -7879,7 +7865,7 @@ StartupXLOG(void)
 		 * after we're fully out of recovery mode and already accepting
 		 * queries.
 		 */
-		if (bgwriterLaunched)
+		if (ArchiveRecoveryRequested && IsUnderPostmaster)
 		{
 			if (LocalPromoteIsTriggered)
 			{
@@ -7915,7 +7901,11 @@ StartupXLOG(void)
 								  CHECKPOINT_WAIT);
 		}
 		else
-			CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
+		{
+			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
+							  CHECKPOINT_IMMEDIATE |
+							  CHECKPOINT_WAIT);
+		}
 	}
 
 	if (ArchiveRecoveryRequested)
@@ -12123,7 +12113,7 @@ XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,
 		 * Request a restartpoint if we've replayed too much xlog since the
 		 * last one.
 		 */
-		if (bgwriterLaunched)
+		if (ArchiveRecoveryRequested && IsUnderPostmaster)
 		{
 			if (XLogCheckpointNeeded(readSegNo))
 			{
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 715d5195bb..5584f4bc24 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -12,9 +12,6 @@
  *
  * As of Postgres 9.2 the bgwriter no longer handles checkpoints.
  *
- * The bgwriter is started by the postmaster as soon as the startup subprocess
- * finishes, or as soon as recovery begins if we are doing archive recovery.
- * It remains alive until the postmaster commands it to terminate.
  * Normal termination is by SIGTERM, which instructs the bgwriter to exit(0).
  * Emergency termination is by SIGQUIT; like any backend, the bgwriter will
  * simply abort and exit on SIGQUIT.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 5907a7befc..7b7f01e1fc 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -10,9 +10,6 @@
  * fill WAL segments; the checkpointer itself doesn't watch for the
  * condition.)
  *
- * The checkpointer is started by the postmaster as soon as the startup
- * subprocess finishes, or as soon as recovery begins if we are doing archive
- * recovery.  It remains alive until the postmaster commands it to terminate.
  * Normal termination is by SIGUSR2, which instructs the checkpointer to
  * execute a shutdown checkpoint and then exit(0).  (All backends must be
  * stopped before SIGUSR2 is issued!)  Emergency termination is by SIGQUIT;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index edab95a19e..4d5b4b9775 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1392,6 +1392,12 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	AddToDataDirLockFile(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_STARTING);
 
+	/* Start bgwriter and checkpointer so they can help with recovery */
+	if (CheckpointerPID == 0)
+		CheckpointerPID = StartCheckpointer();
+	if (BgWriterPID == 0)
+		BgWriterPID = StartBackgroundWriter();
+
 	/*
 	 * We're ready to rock and roll...
 	 */
@@ -1754,7 +1760,7 @@ ServerLoop(void)
 		 * fails, we'll just try again later.  Likewise for the checkpointer.
 		 */
 		if (pmState == PM_RUN || pmState == PM_RECOVERY ||
-			pmState == PM_HOT_STANDBY)
+			pmState == PM_HOT_STANDBY || pmState == PM_STARTUP)
 		{
 			if (CheckpointerPID == 0)
 				CheckpointerPID = StartCheckpointer();
@@ -5125,15 +5131,6 @@ sigusr1_handler(SIGNAL_ARGS)
 		FatalError = false;
 		AbortStartTime = 0;
 
-		/*
-		 * Crank up the background tasks.  It doesn't matter if this fails,
-		 * we'll just try again later.
-		 */
-		Assert(CheckpointerPID == 0);
-		CheckpointerPID = StartCheckpointer();
-		Assert(BgWriterPID == 0);
-		BgWriterPID = StartBackgroundWriter();
-
 		/*
 		 * Start the archiver if we're responsible for (re-)archiving received
 		 * files.
diff --git a/src/backend/storage/sync/sync.c b/src/backend/storage/sync/sync.c
index 708215614d..ee18898ea2 100644
--- a/src/backend/storage/sync/sync.c
+++ b/src/backend/storage/sync/sync.c
@@ -129,10 +129,10 @@ InitSync(void)
 {
 	/*
 	 * Create pending-operations hashtable if we need it.  Currently, we need
-	 * it if we are standalone (not under a postmaster) or if we are a startup
-	 * or checkpointer auxiliary process.
+	 * it if we are standalone (not under a postmaster) or if we are a
+	 * checkpointer auxiliary process.
 	 */
-	if (!IsUnderPostmaster || AmStartupProcess() || AmCheckpointerProcess())
+	if (!IsUnderPostmaster || AmCheckpointerProcess())
 	{
 		HASHCTL		hash_ctl;
 
@@ -589,27 +589,3 @@ RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
 
 	return ret;
 }
-
-/*
- * In archive recovery, we rely on checkpointer to do fsyncs, but we will have
- * already created the pendingOps during initialization of the startup
- * process.  Calling this function drops the local pendingOps so that
- * subsequent requests will be forwarded to checkpointer.
- */
-void
-EnableSyncRequestForwarding(void)
-{
-	/* Perform any pending fsyncs we may have queued up, then drop table */
-	if (pendingOps)
-	{
-		ProcessSyncRequests();
-		hash_destroy(pendingOps);
-	}
-	pendingOps = NULL;
-
-	/*
-	 * We should not have any pending unlink requests, since mdunlink doesn't
-	 * queue unlink requests when isRedo.
-	 */
-	Assert(pendingUnlinks == NIL);
-}
diff --git a/src/include/storage/sync.h b/src/include/storage/sync.h
index fbdf34f762..6fd50cfa7b 100644
--- a/src/include/storage/sync.h
+++ b/src/include/storage/sync.h
@@ -60,7 +60,6 @@ extern void SyncPreCheckpoint(void);
 extern void SyncPostCheckpoint(void);
 extern void ProcessSyncRequests(void);
 extern void RememberSyncRequest(const FileTag *ftag, SyncRequestType type);
-extern void EnableSyncRequestForwarding(void);
 extern bool RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
 								bool retryOnError);
 
-- 
2.30.1

From 3568596c9e2a6331f60d8e25bff35546279a42e4 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Sat, 13 Mar 2021 00:57:38 +1300
Subject: [PATCH v2 2/3] Log buffer stats after crash recovery.

Author: Simon Riggs <[email protected]>
Discussion: https://postgr.es/m/CANP8%2BjLqwRoKQEvAi%3DiRhSwgL1JWNxiUhx%2BRyG-D82CE_qrDew%40mail.gmail.com
---
 src/backend/access/transam/xlog.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6e8e6cf1e4..b10b1b614a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7574,6 +7574,29 @@ StartupXLOG(void)
 					(errmsg("redo done at %X/%X system usage: %s",
 							LSN_FORMAT_ARGS(ReadRecPtr),
 							pg_rusage_show(&ru0))));
+			if (!InArchiveRecovery)
+			{
+				if (!INSTR_TIME_IS_ZERO(pgBufferUsage.blk_read_time))
+					ereport(LOG,
+						(errmsg("crash recovery complete: wrote %ld buffers (%.1f%%) (%0.3f ms); "
+								"dirtied %ld buffers; read %ld buffers (%0.3f ms)",
+								pgBufferUsage.shared_blks_written,
+								(double) pgBufferUsage.shared_blks_written * 100 / NBuffers,
+								INSTR_TIME_GET_MILLISEC(pgBufferUsage.blk_write_time),
+								pgBufferUsage.shared_blks_dirtied,
+								pgBufferUsage.shared_blks_read,
+								INSTR_TIME_GET_MILLISEC(pgBufferUsage.blk_read_time)
+								)));
+				else
+					ereport(LOG,
+						(errmsg("crash recovery complete: wrote %ld buffers (%.1f%%); "
+								"dirtied %ld buffers; read %ld buffers",
+								pgBufferUsage.shared_blks_written,
+								(double) pgBufferUsage.shared_blks_written * 100 / NBuffers,
+								pgBufferUsage.shared_blks_dirtied,
+								pgBufferUsage.shared_blks_read
+								)));
+			}
 			xtime = GetLatestXTime();
 			if (xtime)
 				ereport(LOG,
-- 
2.30.1

From 6420ae1112818ed7e4f4d5ecd306ef10f8873556 Mon Sep 17 00:00:00 2001
From: Thomas Munro <[email protected]>
Date: Fri, 28 Aug 2020 16:35:25 +1200
Subject: [PATCH v2 3/3] Optionally, don't wait for end-of-recovery checkpoint.

Add a GUC to specify whether you want to wait for the end-of-recovery
checkpoint to complete before continuing.  Default to the traditional
behavior, "on".

XXX This probably needs a better GUC name.  There may also be problems with
timelines; see the nearby fast promotion code.

XXX This is experimental code only!  There may be fundamental problems
with the concept.  Or not.
---
 src/backend/access/transam/xlog.c | 23 +++++++++++++++--------
 src/backend/utils/misc/guc.c      |  9 +++++++++
 src/include/access/xlog.h         |  1 +
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index b10b1b614a..4c001c466b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -111,6 +111,7 @@ int			CommitSiblings = 5; /* # concurrent xacts needed to sleep */
 int			wal_retrieve_retry_interval = 5000;
 int			max_slot_wal_keep_size_mb = -1;
 bool		track_wal_io_timing = false;
+bool		end_of_recovery_checkpoint_wait = true;
 
 #ifdef WAL_DEBUG
 bool		XLOG_DEBUG = false;
@@ -7874,6 +7875,18 @@ StartupXLOG(void)
 
 	if (InRecovery)
 	{
+		int		checkpoint_flags = CHECKPOINT_IMMEDIATE;
+
+		/*
+		 * Should we wait for the end-of-recovery checkpoint to complete?
+		 * Traditionally we do this, but it may be useful to allow connections
+		 * sooner.
+		 */
+		if (end_of_recovery_checkpoint_wait)
+			checkpoint_flags |= CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_WAIT;
+		else
+			checkpoint_flags |= CHECKPOINT_FORCE;
+
 		/*
 		 * Perform a checkpoint to update all our recovery activity to disk.
 		 *
@@ -7919,16 +7932,10 @@ StartupXLOG(void)
 			}
 
 			if (!promoted)
-				RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-								  CHECKPOINT_IMMEDIATE |
-								  CHECKPOINT_WAIT);
+				RequestCheckpoint(checkpoint_flags);
 		}
 		else
-		{
-			RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-							  CHECKPOINT_IMMEDIATE |
-							  CHECKPOINT_WAIT);
-		}
+			RequestCheckpoint(checkpoint_flags);
 	}
 
 	if (ArchiveRecoveryRequested)
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 855076b1fd..e59194a3f8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2058,6 +2058,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"end_of_recovery_checkpoint_wait", PGC_SIGHUP, WAL_CHECKPOINTS,
+			gettext_noop("Whether to wait for the end-of-recovery checkpoint to complete."),
+		},
+		&end_of_recovery_checkpoint_wait,
+		true,
+		NULL, NULL, NULL
+	},
+
 	/* End-of-list marker */
 	{
 		{NULL, 0, 0, NULL, NULL}, NULL, false, NULL, NULL, NULL
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 6d384d3ce6..77b4560f4b 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -132,6 +132,7 @@ extern char *PrimaryConnInfo;
 extern char *PrimarySlotName;
 extern bool wal_receiver_create_temp_slot;
 extern bool track_wal_io_timing;
+extern bool end_of_recovery_checkpoint_wait;
 
 /* indirectly set via GUC system */
 extern TransactionId recoveryTargetXid;
-- 
2.30.1

Re: Background writer and checkpointer in crash recovery

Reply via email to