Re: using an end-of-recovery record in all cases

Amul Sul Sun, 17 Oct 2021 22:27:49 -0700

On Wed, Oct 6, 2021 at 7:24 PM Amul Sul <sula...@gmail.com> wrote:
>
> On Tue, Oct 5, 2021 at 10:42 PM Robert Haas <robertmh...@gmail.com> wrote:
> >
> > On Tue, Oct 5, 2021 at 12:41 PM Amul Sul <sula...@gmail.com> wrote:
> > > No, InRecovery flag get cleared before this point. I think, we can use 
> > > lastReplayedEndRecPtr what you have suggested in other thread.
> >
> > Hmm, right, that makes sense. Perhaps I should start remembering what
> > I said in my own emails.
> >
>
> Here I end up with the attached version where I have dropped the
> changes for standby.c and 018_wal_optimize.pl files.  Also, I am not
> sure that we should have the changes for bgwriter.c and slot.c in this
> patch, but that's not touched.
>


Attached is the rebased and updated version. The patch removes the
newly introduced PerformRecoveryXLogAction() function. In addition to
that, removed the CHECKPOINT_END_OF_RECOVERY flag and its related
code.  Also, dropped changes for bgwriter.c and slot.c in this patch, which
seem unrelated to this work.

Regards,
Amul

From 7cf4d270ef8649426fb16f2594a7ed7dcf8cfa5d Mon Sep 17 00:00:00 2001
From: Amul Sul <amul.sul@enterprisedb.com>
Date: Fri, 15 Oct 2021 10:38:55 -0400
Subject: [PATCH v3] Always use an end-of-recovery record rather than a
 checkpoint.

Robert Haas, with modifications by Amul Sul.
---
 src/backend/access/transam/xlog.c     | 144 ++++++--------------------
 src/backend/postmaster/checkpointer.c |  11 +-
 src/backend/storage/buffer/bufmgr.c   |  14 ++-
 src/include/access/xlog.h             |  16 ++-
 4 files changed, 47 insertions(+), 138 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 62862255fca..c6543f35da4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -939,7 +939,6 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
 							  int emode, bool fetching_ckpt);
 static void CheckRecoveryConsistency(void);
-static bool PerformRecoveryXLogAction(void);
 static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader,
 										XLogRecPtr RecPtr, int whichChkpt, bool report);
 static bool rescanLatestTimeLine(void);
@@ -6634,7 +6633,6 @@ StartupXLOG(void)
 	DBState		dbstate_at_startup;
 	XLogReaderState *xlogreader;
 	XLogPageReadPrivate private;
-	bool		promoted = false;
 	struct stat st;
 
 	/*
@@ -8083,7 +8081,7 @@ StartupXLOG(void)
 	LocalXLogInsertAllowed = -1;
 
 	/*
-	 * Emit checkpoint or end-of-recovery record in XLOG, if required.
+	 * Insert a special WAL record to mark the end of recovery.
 	 *
 	 * XLogCtl->lastReplayedEndRecPtr will be a valid LSN if and only if we
 	 * entered recovery. Even if we ultimately replayed no WAL records, it will
@@ -8092,7 +8090,7 @@ StartupXLOG(void)
 	 * we reach this code.
 	 */
 	if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr))
-		promoted = PerformRecoveryXLogAction();
+		CreateEndOfRecoveryRecord();
 
 	/* If this is archive recovery, perform post-recovery cleanup actions. */
 	if (ArchiveRecoveryRequested)
@@ -8156,12 +8154,22 @@ StartupXLOG(void)
 	WalSndWakeup();
 
 	/*
-	 * If this was a promotion, request an (online) checkpoint now. This isn't
-	 * required for consistency, but the last restartpoint might be far back,
-	 * and in case of a crash, recovering from it might take a longer than is
-	 * appropriate now that we're not in standby mode anymore.
+	 * Request an (online) checkpoint now. Note that, until this is complete,
+	 * a crash would start replay from the same WAL location we did, or from
+	 * the last restartpoint that completed. We don't want to let that
+	 * situation persist for longer than necessary, since users don't like
+	 * long recovery times. On the other hand, they also want to be able to
+	 * start doing useful work again as quickly as possible. Therfore, we
+	 * don't pass CHECKPOINT_IMMEDIATE to avoid bogging down the system.
+	 *
+	 * Note that the consequence of requesting a checkpoint here only after
+	 * we've allowed WAL writes is that a single checkpoint cycle can span
+	 * multiple server lifetimes. So for example if you want to something to
+	 * happen at least once per checkpoint cycle or at most once per
+	 * checkpoint cycle, you have to consider what happens if the server
+	 * is restarted someplace in the middle.
 	 */
-	if (promoted)
+	if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr))
 		RequestCheckpoint(CHECKPOINT_FORCE);
 }
 
@@ -8261,60 +8269,6 @@ CheckRecoveryConsistency(void)
 	}
 }
 
-/*
- * Perform whatever XLOG actions are necessary at end of REDO.
- *
- * The goal here is to make sure that we'll be able to recover properly if
- * we crash again. If we choose to write a checkpoint, we'll write a shutdown
- * checkpoint rather than an on-line one. This is not particularly critical,
- * but since we may be assigning a new TLI, using a shutdown checkpoint allows
- * us to have the rule that TLI only changes in shutdown checkpoints, which
- * allows some extra error checking in xlog_redo.
- */
-static bool
-PerformRecoveryXLogAction(void)
-{
-	bool		promoted = false;
-
-	/*
-	 * Perform a checkpoint to update all our recovery activity to disk.
-	 *
-	 * Note that we write a shutdown checkpoint rather than an on-line one. This
-	 * is not particularly critical, but since we may be assigning a new TLI,
-	 * using a shutdown checkpoint allows us to have the rule that TLI only
-	 * changes in shutdown checkpoints, which allows some extra error checking
-	 * in xlog_redo.
-	 *
-	 * In promotion, only create a lightweight end-of-recovery record instead of
-	 * a full checkpoint. A checkpoint is requested later, after we're fully out
-	 * of recovery mode and already accepting queries.
-	 */
-	if (ArchiveRecoveryRequested && IsUnderPostmaster &&
-		LocalPromoteIsTriggered)
-	{
-		promoted = true;
-
-		/*
-		 * Insert a special WAL record to mark the end of recovery, since we
-		 * aren't doing a checkpoint. That means that the checkpointer process
-		 * may likely be in the middle of a time-smoothed restartpoint and could
-		 * continue to be for minutes after this.  That sounds strange, but the
-		 * effect is roughly the same and it would be stranger to try to come
-		 * out of the restartpoint and then checkpoint. We request a checkpoint
-		 * later anyway, just for safety.
-		 */
-		CreateEndOfRecoveryRecord();
-	}
-	else
-	{
-		RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY |
-						  CHECKPOINT_IMMEDIATE |
-						  CHECKPOINT_WAIT);
-	}
-
-	return promoted;
-}
-
 /*
  * Is the system still in recovery?
  *
@@ -8793,9 +8747,8 @@ LogCheckpointStart(int flags, bool restartpoint)
 	if (restartpoint)
 		ereport(LOG,
 		/* translator: the placeholders show checkpoint options */
-				(errmsg("restartpoint starting:%s%s%s%s%s%s%s%s",
+				(errmsg("restartpoint starting:%s%s%s%s%s%s%s",
 						(flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
-						(flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
 						(flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
 						(flags & CHECKPOINT_FORCE) ? " force" : "",
 						(flags & CHECKPOINT_WAIT) ? " wait" : "",
@@ -8805,9 +8758,8 @@ LogCheckpointStart(int flags, bool restartpoint)
 	else
 		ereport(LOG,
 		/* translator: the placeholders show checkpoint options */
-				(errmsg("checkpoint starting:%s%s%s%s%s%s%s%s",
+				(errmsg("checkpoint starting:%s%s%s%s%s%s%s",
 						(flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "",
-						(flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "",
 						(flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "",
 						(flags & CHECKPOINT_FORCE) ? " force" : "",
 						(flags & CHECKPOINT_WAIT) ? " wait" : "",
@@ -8953,13 +8905,12 @@ static void
 update_checkpoint_display(int flags, bool restartpoint, bool reset)
 {
 	/*
-	 * The status is reported only for end-of-recovery and shutdown
-	 * checkpoints or shutdown restartpoints.  Updating the ps display is
-	 * useful in those situations as it may not be possible to rely on
-	 * pg_stat_activity to see the status of the checkpointer or the startup
-	 * process.
+	 * The status is reported only for shutdown checkpoints or shutdown
+	 * restartpoints.  Updating the ps display is useful in those situations as
+	 * it may not be possible to rely on pg_stat_activity to see the status of
+	 * the checkpointer or the startup process.
 	 */
-	if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0)
+	if ((flags & CHECKPOINT_IS_SHUTDOWN) == 0)
 		return;
 
 	if (reset)
@@ -8968,8 +8919,7 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
 	{
 		char		activitymsg[128];
 
-		snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s",
-				 (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "",
+		snprintf(activitymsg, sizeof(activitymsg), "performing %s%s",
 				 (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "",
 				 restartpoint ? "restartpoint" : "checkpoint");
 		set_ps_display(activitymsg);
@@ -8982,12 +8932,10 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset)
  *
  * flags is a bitwise OR of the following:
  *	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
- *	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
  *	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
  *		ignoring checkpoint_completion_target parameter.
  *	CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
- *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
- *		CHECKPOINT_END_OF_RECOVERY).
+ *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN).
  *	CHECKPOINT_FLUSH_ALL: also flush buffers of unlogged tables.
  *
  * Note: flags contains other bits, of interest here only for logging purposes.
@@ -9021,17 +8969,11 @@ CreateCheckPoint(int flags)
 	VirtualTransactionId *vxids;
 	int			nvxids;
 
-	/*
-	 * An end-of-recovery checkpoint is really a shutdown checkpoint, just
-	 * issued at a different time.
-	 */
-	if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))
-		shutdown = true;
-	else
-		shutdown = false;
+	/* Is shutdown checkpoint? */
+	shutdown = ((flags & CHECKPOINT_IS_SHUTDOWN) != 0);
 
 	/* sanity check */
-	if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
+	if (RecoveryInProgress())
 		elog(ERROR, "can't create a checkpoint during recovery");
 
 	/*
@@ -9107,8 +9049,7 @@ CreateCheckPoint(int flags)
 	 * WAL activity requiring a checkpoint, skip it.  The idea here is to
 	 * avoid inserting duplicate checkpoints when the system is idle.
 	 */
-	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
-				  CHECKPOINT_FORCE)) == 0)
+	if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0)
 	{
 		if (last_important_lsn == ControlFile->checkPoint)
 		{
@@ -9120,20 +9061,8 @@ CreateCheckPoint(int flags)
 		}
 	}
 
-	/*
-	 * An end-of-recovery checkpoint is created before anyone is allowed to
-	 * write WAL. To allow us to write the checkpoint record, temporarily
-	 * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
-	 * initialized, which we need here and in AdvanceXLInsertBuffer.)
-	 */
-	if (flags & CHECKPOINT_END_OF_RECOVERY)
-		LocalSetXLogInsertAllowed();
-
 	checkPoint.ThisTimeLineID = ThisTimeLineID;
-	if (flags & CHECKPOINT_END_OF_RECOVERY)
-		checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID;
-	else
-		checkPoint.PrevTimeLineID = ThisTimeLineID;
+	checkPoint.PrevTimeLineID = ThisTimeLineID;
 
 	checkPoint.fullPageWrites = Insert->fullPageWrites;
 
@@ -9300,17 +9229,10 @@ CreateCheckPoint(int flags)
 	/*
 	 * We mustn't write any new WAL after a shutdown checkpoint, or it will be
 	 * overwritten at next startup.  No-one should even try, this just allows
-	 * sanity-checking.  In the case of an end-of-recovery checkpoint, we want
-	 * to just temporarily disable writing until the system has exited
-	 * recovery.
+	 * sanity-checking.
 	 */
 	if (shutdown)
-	{
-		if (flags & CHECKPOINT_END_OF_RECOVERY)
-			LocalXLogInsertAllowed = -1;	/* return to "check" state */
-		else
-			LocalXLogInsertAllowed = 0; /* never again write WAL */
-	}
+		LocalXLogInsertAllowed = 0; /* never again write WAL */
 
 	/*
 	 * We now have ProcLastRecPtr = start of actual checkpoint record, recptr
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index be7366379d0..e7b8f5d6c33 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -404,13 +404,6 @@ CheckpointerMain(void)
 
 			ConditionVariableBroadcast(&CheckpointerShmem->start_cv);
 
-			/*
-			 * The end-of-recovery checkpoint is a real checkpoint that's
-			 * performed while we're still in recovery.
-			 */
-			if (flags & CHECKPOINT_END_OF_RECOVERY)
-				do_restartpoint = false;
-
 			/*
 			 * We will warn if (a) too soon since last checkpoint (whatever
 			 * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag
@@ -905,12 +898,10 @@ CheckpointerShmemInit(void)
  *
  * flags is a bitwise OR of the following:
  *	CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown.
- *	CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery.
  *	CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP,
  *		ignoring checkpoint_completion_target parameter.
  *	CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred
- *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN or
- *		CHECKPOINT_END_OF_RECOVERY).
+ *		since the last one (implied by CHECKPOINT_IS_SHUTDOWN)
  *	CHECKPOINT_WAIT: wait for completion before returning (otherwise,
  *		just signal checkpointer to do it, and return).
  *	CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling.
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 08ebabfe96a..9aa3d5f9194 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1925,9 +1925,8 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
  * This is called at checkpoint time to write out all dirty shared buffers.
  * The checkpoint request flags should be passed in.  If CHECKPOINT_IMMEDIATE
  * is set, we disable delays between writes; if CHECKPOINT_IS_SHUTDOWN,
- * CHECKPOINT_END_OF_RECOVERY or CHECKPOINT_FLUSH_ALL is set, we write even
- * unlogged buffers, which are otherwise skipped.  The remaining flags
- * currently have no effect here.
+ * CHECKPOINT_FLUSH_ALL is set, we write even unlogged buffers, which are
+ * otherwise skipped.  The remaining flags currently have no effect here.
  */
 static void
 BufferSync(int flags)
@@ -1949,12 +1948,11 @@ BufferSync(int flags)
 	ResourceOwnerEnlargeBuffers(CurrentResourceOwner);
 
 	/*
-	 * Unless this is a shutdown checkpoint or we have been explicitly told,
-	 * we write only permanent, dirty buffers.  But at shutdown or end of
-	 * recovery, we write all dirty buffers.
+	 * Unless this is a shutdown checkpoint or we have been explicitly told, we
+	 * write only permanent, dirty buffers.  But at shutdown, we write all dirty
+	 * buffers.
 	 */
-	if (!((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY |
-					CHECKPOINT_FLUSH_ALL))))
+	if (!((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FLUSH_ALL))))
 		mask |= BM_PERMANENT;
 
 	/*
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 5e2c94a05ff..661473d4669 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -194,18 +194,16 @@ extern bool XLOG_DEBUG;
 
 /* These directly affect the behavior of CreateCheckPoint and subsidiaries */
 #define CHECKPOINT_IS_SHUTDOWN	0x0001	/* Checkpoint is for shutdown */
-#define CHECKPOINT_END_OF_RECOVERY	0x0002	/* Like shutdown checkpoint, but
-											 * issued at end of WAL recovery */
-#define CHECKPOINT_IMMEDIATE	0x0004	/* Do it without delays */
-#define CHECKPOINT_FORCE		0x0008	/* Force even if no activity */
-#define CHECKPOINT_FLUSH_ALL	0x0010	/* Flush all pages, including those
+#define CHECKPOINT_IMMEDIATE	0x0002	/* Do it without delays */
+#define CHECKPOINT_FORCE		0x0004	/* Force even if no activity */
+#define CHECKPOINT_FLUSH_ALL	0x0008	/* Flush all pages, including those
 										 * belonging to unlogged tables */
 /* These are important to RequestCheckpoint */
-#define CHECKPOINT_WAIT			0x0020	/* Wait for completion */
-#define CHECKPOINT_REQUESTED	0x0040	/* Checkpoint request has been made */
+#define CHECKPOINT_WAIT			0x0010	/* Wait for completion */
+#define CHECKPOINT_REQUESTED	0x0020	/* Checkpoint request has been made */
 /* These indicate the cause of a checkpoint request */
-#define CHECKPOINT_CAUSE_XLOG	0x0080	/* XLOG consumption */
-#define CHECKPOINT_CAUSE_TIME	0x0100	/* Elapsed time */
+#define CHECKPOINT_CAUSE_XLOG	0x0040	/* XLOG consumption */
+#define CHECKPOINT_CAUSE_TIME	0x0080	/* Elapsed time */
 
 /*
  * Flag bits for the record being inserted, set using XLogSetRecordFlags().
-- 
2.18.0

Re: using an end-of-recovery record in all cases

Reply via email to