On Wed, Oct 6, 2021 at 7:24 PM Amul Sul <sula...@gmail.com> wrote: > > On Tue, Oct 5, 2021 at 10:42 PM Robert Haas <robertmh...@gmail.com> wrote: > > > > On Tue, Oct 5, 2021 at 12:41 PM Amul Sul <sula...@gmail.com> wrote: > > > No, InRecovery flag get cleared before this point. I think, we can use > > > lastReplayedEndRecPtr what you have suggested in other thread. > > > > Hmm, right, that makes sense. Perhaps I should start remembering what > > I said in my own emails. > > > > Here I end up with the attached version where I have dropped the > changes for standby.c and 018_wal_optimize.pl files. Also, I am not > sure that we should have the changes for bgwriter.c and slot.c in this > patch, but that's not touched. >
Attached is the rebased and updated version. The patch removes the newly introduced PerformRecoveryXLogAction() function. In addition to that, removed the CHECKPOINT_END_OF_RECOVERY flag and its related code. Also, dropped changes for bgwriter.c and slot.c in this patch, which seem unrelated to this work. Regards, Amul
From 7cf4d270ef8649426fb16f2594a7ed7dcf8cfa5d Mon Sep 17 00:00:00 2001 From: Amul Sul <amul.sul@enterprisedb.com> Date: Fri, 15 Oct 2021 10:38:55 -0400 Subject: [PATCH v3] Always use an end-of-recovery record rather than a checkpoint. Robert Haas, with modifications by Amul Sul. --- src/backend/access/transam/xlog.c | 144 ++++++-------------------- src/backend/postmaster/checkpointer.c | 11 +- src/backend/storage/buffer/bufmgr.c | 14 ++- src/include/access/xlog.h | 16 ++- 4 files changed, 47 insertions(+), 138 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 62862255fca..c6543f35da4 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -939,7 +939,6 @@ static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogReaderState *xlogreader, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); -static bool PerformRecoveryXLogAction(void); static XLogRecord *ReadCheckpointRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int whichChkpt, bool report); static bool rescanLatestTimeLine(void); @@ -6634,7 +6633,6 @@ StartupXLOG(void) DBState dbstate_at_startup; XLogReaderState *xlogreader; XLogPageReadPrivate private; - bool promoted = false; struct stat st; /* @@ -8083,7 +8081,7 @@ StartupXLOG(void) LocalXLogInsertAllowed = -1; /* - * Emit checkpoint or end-of-recovery record in XLOG, if required. + * Insert a special WAL record to mark the end of recovery. * * XLogCtl->lastReplayedEndRecPtr will be a valid LSN if and only if we * entered recovery. Even if we ultimately replayed no WAL records, it will @@ -8092,7 +8090,7 @@ StartupXLOG(void) * we reach this code. */ if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) - promoted = PerformRecoveryXLogAction(); + CreateEndOfRecoveryRecord(); /* If this is archive recovery, perform post-recovery cleanup actions. */ if (ArchiveRecoveryRequested) @@ -8156,12 +8154,22 @@ StartupXLOG(void) WalSndWakeup(); /* - * If this was a promotion, request an (online) checkpoint now. This isn't - * required for consistency, but the last restartpoint might be far back, - * and in case of a crash, recovering from it might take a longer than is - * appropriate now that we're not in standby mode anymore. + * Request an (online) checkpoint now. Note that, until this is complete, + * a crash would start replay from the same WAL location we did, or from + * the last restartpoint that completed. We don't want to let that + * situation persist for longer than necessary, since users don't like + * long recovery times. On the other hand, they also want to be able to + * start doing useful work again as quickly as possible. Therfore, we + * don't pass CHECKPOINT_IMMEDIATE to avoid bogging down the system. + * + * Note that the consequence of requesting a checkpoint here only after + * we've allowed WAL writes is that a single checkpoint cycle can span + * multiple server lifetimes. So for example if you want to something to + * happen at least once per checkpoint cycle or at most once per + * checkpoint cycle, you have to consider what happens if the server + * is restarted someplace in the middle. */ - if (promoted) + if (!XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr)) RequestCheckpoint(CHECKPOINT_FORCE); } @@ -8261,60 +8269,6 @@ CheckRecoveryConsistency(void) } } -/* - * Perform whatever XLOG actions are necessary at end of REDO. - * - * The goal here is to make sure that we'll be able to recover properly if - * we crash again. If we choose to write a checkpoint, we'll write a shutdown - * checkpoint rather than an on-line one. This is not particularly critical, - * but since we may be assigning a new TLI, using a shutdown checkpoint allows - * us to have the rule that TLI only changes in shutdown checkpoints, which - * allows some extra error checking in xlog_redo. - */ -static bool -PerformRecoveryXLogAction(void) -{ - bool promoted = false; - - /* - * Perform a checkpoint to update all our recovery activity to disk. - * - * Note that we write a shutdown checkpoint rather than an on-line one. This - * is not particularly critical, but since we may be assigning a new TLI, - * using a shutdown checkpoint allows us to have the rule that TLI only - * changes in shutdown checkpoints, which allows some extra error checking - * in xlog_redo. - * - * In promotion, only create a lightweight end-of-recovery record instead of - * a full checkpoint. A checkpoint is requested later, after we're fully out - * of recovery mode and already accepting queries. - */ - if (ArchiveRecoveryRequested && IsUnderPostmaster && - LocalPromoteIsTriggered) - { - promoted = true; - - /* - * Insert a special WAL record to mark the end of recovery, since we - * aren't doing a checkpoint. That means that the checkpointer process - * may likely be in the middle of a time-smoothed restartpoint and could - * continue to be for minutes after this. That sounds strange, but the - * effect is roughly the same and it would be stranger to try to come - * out of the restartpoint and then checkpoint. We request a checkpoint - * later anyway, just for safety. - */ - CreateEndOfRecoveryRecord(); - } - else - { - RequestCheckpoint(CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_IMMEDIATE | - CHECKPOINT_WAIT); - } - - return promoted; -} - /* * Is the system still in recovery? * @@ -8793,9 +8747,8 @@ LogCheckpointStart(int flags, bool restartpoint) if (restartpoint) ereport(LOG, /* translator: the placeholders show checkpoint options */ - (errmsg("restartpoint starting:%s%s%s%s%s%s%s%s", + (errmsg("restartpoint starting:%s%s%s%s%s%s%s", (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "", - (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "", (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "", (flags & CHECKPOINT_FORCE) ? " force" : "", (flags & CHECKPOINT_WAIT) ? " wait" : "", @@ -8805,9 +8758,8 @@ LogCheckpointStart(int flags, bool restartpoint) else ereport(LOG, /* translator: the placeholders show checkpoint options */ - (errmsg("checkpoint starting:%s%s%s%s%s%s%s%s", + (errmsg("checkpoint starting:%s%s%s%s%s%s%s", (flags & CHECKPOINT_IS_SHUTDOWN) ? " shutdown" : "", - (flags & CHECKPOINT_END_OF_RECOVERY) ? " end-of-recovery" : "", (flags & CHECKPOINT_IMMEDIATE) ? " immediate" : "", (flags & CHECKPOINT_FORCE) ? " force" : "", (flags & CHECKPOINT_WAIT) ? " wait" : "", @@ -8953,13 +8905,12 @@ static void update_checkpoint_display(int flags, bool restartpoint, bool reset) { /* - * The status is reported only for end-of-recovery and shutdown - * checkpoints or shutdown restartpoints. Updating the ps display is - * useful in those situations as it may not be possible to rely on - * pg_stat_activity to see the status of the checkpointer or the startup - * process. + * The status is reported only for shutdown checkpoints or shutdown + * restartpoints. Updating the ps display is useful in those situations as + * it may not be possible to rely on pg_stat_activity to see the status of + * the checkpointer or the startup process. */ - if ((flags & (CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IS_SHUTDOWN)) == 0) + if ((flags & CHECKPOINT_IS_SHUTDOWN) == 0) return; if (reset) @@ -8968,8 +8919,7 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset) { char activitymsg[128]; - snprintf(activitymsg, sizeof(activitymsg), "performing %s%s%s", - (flags & CHECKPOINT_END_OF_RECOVERY) ? "end-of-recovery " : "", + snprintf(activitymsg, sizeof(activitymsg), "performing %s%s", (flags & CHECKPOINT_IS_SHUTDOWN) ? "shutdown " : "", restartpoint ? "restartpoint" : "checkpoint"); set_ps_display(activitymsg); @@ -8982,12 +8932,10 @@ update_checkpoint_display(int flags, bool restartpoint, bool reset) * * flags is a bitwise OR of the following: * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown. - * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery. * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP, * ignoring checkpoint_completion_target parameter. * CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred - * since the last one (implied by CHECKPOINT_IS_SHUTDOWN or - * CHECKPOINT_END_OF_RECOVERY). + * since the last one (implied by CHECKPOINT_IS_SHUTDOWN). * CHECKPOINT_FLUSH_ALL: also flush buffers of unlogged tables. * * Note: flags contains other bits, of interest here only for logging purposes. @@ -9021,17 +8969,11 @@ CreateCheckPoint(int flags) VirtualTransactionId *vxids; int nvxids; - /* - * An end-of-recovery checkpoint is really a shutdown checkpoint, just - * issued at a different time. - */ - if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) - shutdown = true; - else - shutdown = false; + /* Is shutdown checkpoint? */ + shutdown = ((flags & CHECKPOINT_IS_SHUTDOWN) != 0); /* sanity check */ - if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0) + if (RecoveryInProgress()) elog(ERROR, "can't create a checkpoint during recovery"); /* @@ -9107,8 +9049,7 @@ CreateCheckPoint(int flags) * WAL activity requiring a checkpoint, skip it. The idea here is to * avoid inserting duplicate checkpoints when the system is idle. */ - if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_FORCE)) == 0) + if ((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FORCE)) == 0) { if (last_important_lsn == ControlFile->checkPoint) { @@ -9120,20 +9061,8 @@ CreateCheckPoint(int flags) } } - /* - * An end-of-recovery checkpoint is created before anyone is allowed to - * write WAL. To allow us to write the checkpoint record, temporarily - * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is - * initialized, which we need here and in AdvanceXLInsertBuffer.) - */ - if (flags & CHECKPOINT_END_OF_RECOVERY) - LocalSetXLogInsertAllowed(); - checkPoint.ThisTimeLineID = ThisTimeLineID; - if (flags & CHECKPOINT_END_OF_RECOVERY) - checkPoint.PrevTimeLineID = XLogCtl->PrevTimeLineID; - else - checkPoint.PrevTimeLineID = ThisTimeLineID; + checkPoint.PrevTimeLineID = ThisTimeLineID; checkPoint.fullPageWrites = Insert->fullPageWrites; @@ -9300,17 +9229,10 @@ CreateCheckPoint(int flags) /* * We mustn't write any new WAL after a shutdown checkpoint, or it will be * overwritten at next startup. No-one should even try, this just allows - * sanity-checking. In the case of an end-of-recovery checkpoint, we want - * to just temporarily disable writing until the system has exited - * recovery. + * sanity-checking. */ if (shutdown) - { - if (flags & CHECKPOINT_END_OF_RECOVERY) - LocalXLogInsertAllowed = -1; /* return to "check" state */ - else - LocalXLogInsertAllowed = 0; /* never again write WAL */ - } + LocalXLogInsertAllowed = 0; /* never again write WAL */ /* * We now have ProcLastRecPtr = start of actual checkpoint record, recptr diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index be7366379d0..e7b8f5d6c33 100644 --- a/src/backend/postmaster/checkpointer.c +++ b/src/backend/postmaster/checkpointer.c @@ -404,13 +404,6 @@ CheckpointerMain(void) ConditionVariableBroadcast(&CheckpointerShmem->start_cv); - /* - * The end-of-recovery checkpoint is a real checkpoint that's - * performed while we're still in recovery. - */ - if (flags & CHECKPOINT_END_OF_RECOVERY) - do_restartpoint = false; - /* * We will warn if (a) too soon since last checkpoint (whatever * caused it) and (b) somebody set the CHECKPOINT_CAUSE_XLOG flag @@ -905,12 +898,10 @@ CheckpointerShmemInit(void) * * flags is a bitwise OR of the following: * CHECKPOINT_IS_SHUTDOWN: checkpoint is for database shutdown. - * CHECKPOINT_END_OF_RECOVERY: checkpoint is for end of WAL recovery. * CHECKPOINT_IMMEDIATE: finish the checkpoint ASAP, * ignoring checkpoint_completion_target parameter. * CHECKPOINT_FORCE: force a checkpoint even if no XLOG activity has occurred - * since the last one (implied by CHECKPOINT_IS_SHUTDOWN or - * CHECKPOINT_END_OF_RECOVERY). + * since the last one (implied by CHECKPOINT_IS_SHUTDOWN) * CHECKPOINT_WAIT: wait for completion before returning (otherwise, * just signal checkpointer to do it, and return). * CHECKPOINT_CAUSE_XLOG: checkpoint is requested due to xlog filling. diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c index 08ebabfe96a..9aa3d5f9194 100644 --- a/src/backend/storage/buffer/bufmgr.c +++ b/src/backend/storage/buffer/bufmgr.c @@ -1925,9 +1925,8 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner) * This is called at checkpoint time to write out all dirty shared buffers. * The checkpoint request flags should be passed in. If CHECKPOINT_IMMEDIATE * is set, we disable delays between writes; if CHECKPOINT_IS_SHUTDOWN, - * CHECKPOINT_END_OF_RECOVERY or CHECKPOINT_FLUSH_ALL is set, we write even - * unlogged buffers, which are otherwise skipped. The remaining flags - * currently have no effect here. + * CHECKPOINT_FLUSH_ALL is set, we write even unlogged buffers, which are + * otherwise skipped. The remaining flags currently have no effect here. */ static void BufferSync(int flags) @@ -1949,12 +1948,11 @@ BufferSync(int flags) ResourceOwnerEnlargeBuffers(CurrentResourceOwner); /* - * Unless this is a shutdown checkpoint or we have been explicitly told, - * we write only permanent, dirty buffers. But at shutdown or end of - * recovery, we write all dirty buffers. + * Unless this is a shutdown checkpoint or we have been explicitly told, we + * write only permanent, dirty buffers. But at shutdown, we write all dirty + * buffers. */ - if (!((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY | - CHECKPOINT_FLUSH_ALL)))) + if (!((flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_FLUSH_ALL)))) mask |= BM_PERMANENT; /* diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 5e2c94a05ff..661473d4669 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -194,18 +194,16 @@ extern bool XLOG_DEBUG; /* These directly affect the behavior of CreateCheckPoint and subsidiaries */ #define CHECKPOINT_IS_SHUTDOWN 0x0001 /* Checkpoint is for shutdown */ -#define CHECKPOINT_END_OF_RECOVERY 0x0002 /* Like shutdown checkpoint, but - * issued at end of WAL recovery */ -#define CHECKPOINT_IMMEDIATE 0x0004 /* Do it without delays */ -#define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */ -#define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those +#define CHECKPOINT_IMMEDIATE 0x0002 /* Do it without delays */ +#define CHECKPOINT_FORCE 0x0004 /* Force even if no activity */ +#define CHECKPOINT_FLUSH_ALL 0x0008 /* Flush all pages, including those * belonging to unlogged tables */ /* These are important to RequestCheckpoint */ -#define CHECKPOINT_WAIT 0x0020 /* Wait for completion */ -#define CHECKPOINT_REQUESTED 0x0040 /* Checkpoint request has been made */ +#define CHECKPOINT_WAIT 0x0010 /* Wait for completion */ +#define CHECKPOINT_REQUESTED 0x0020 /* Checkpoint request has been made */ /* These indicate the cause of a checkpoint request */ -#define CHECKPOINT_CAUSE_XLOG 0x0080 /* XLOG consumption */ -#define CHECKPOINT_CAUSE_TIME 0x0100 /* Elapsed time */ +#define CHECKPOINT_CAUSE_XLOG 0x0040 /* XLOG consumption */ +#define CHECKPOINT_CAUSE_TIME 0x0080 /* Elapsed time */ /* * Flag bits for the record being inserted, set using XLogSetRecordFlags(). -- 2.18.0