At Tue, 26 Apr 2022 11:33:49 -0700, Nathan Bossart <nathandboss...@gmail.com> wrote in > On Wed, Mar 16, 2022 at 10:24:44AM +0900, Kyotaro Horiguchi wrote: > > While discussing on additional LSNs in checkpoint log message, > > Fujii-san pointed out [2] that there is a case where > > CreateRestartPoint leaves unrecoverable database when concurrent > > promotion happens. That corruption is "fixed" by the next checkpoint > > so it is not a severe corruption. > > I suspect we'll start seeing this problem more often once end-of-recovery > checkpoints are removed [0]. Would you mind creating a commitfest entry > for this thread? I didn't see one.
I'm not sure the patch makes any change here, because restart points don't run while crash recovery, since no checkpoint records seen during a crash recovery. Anyway the patch doesn't apply anymore so rebased, but only the one for master for the lack of time for now. > > AFAICS since 9.5, no check(/restart)pionts won't run concurrently with > > restartpoint [3]. So I propose to remove the code path as attached. > > Yeah, this "quick hack" has been around for some time (2de48a8), and I > believe much has changed since then, so something like what you're > proposing is probably the right thing to do. Thanks for checking! > > /* Also update the info_lck-protected copy */ > > SpinLockAcquire(&XLogCtl->info_lck); > > - XLogCtl->RedoRecPtr = lastCheckPoint.redo; > > + XLogCtl->RedoRecPtr = RedoRecPtr; > > SpinLockRelease(&XLogCtl->info_lck); > > > > /* > > @@ -6984,7 +6987,10 @@ CreateRestartPoint(int flags) > > /* Update the process title */ > > update_checkpoint_display(flags, true, false); > > > > - CheckPointGuts(lastCheckPoint.redo, flags); > > + CheckPointGuts(RedoRecPtr, flags); > > I don't understand the purpose of these changes. Are these related to the > fix, or is this just tidying up? The latter, since the mixed use of two not-guaranteed-to-be-same variables at the same time for the same purpose made me perplexed (but I feel the change can hardly incorporated alone). However, you're right that it is irrelevant to the fix, so removed including other instances of the same. > [0] > https://postgr.es/m/CA%2BTgmoY%2BSJLTjma4Hfn1sA7S6CZAgbihYd%3DKzO6srd7Ut%3DXVBQ%40mail.gmail.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From e764f47cec2476c458a2a1ff00228f982ea132a0 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Wed, 27 Apr 2022 10:41:23 +0900 Subject: [PATCH v5] Correctly update contfol file at the end of archive recovery CreateRestartPoint runs WAL file cleanup basing on the checkpoint just have finished in the function. If the database has exited DB_IN_ARCHIVE_RECOVERY state when the function is going to update control file, the function refrains from updating the file at all then proceeds to WAL cleanup having the latest REDO LSN, which is now inconsistent with the control file. As the result, the succeeding cleanup procedure overly removes WAL files against the control file and leaves unrecoverable database until the next checkpoint finishes. Along with that fix, we remove a dead code path for the case some other process ran a simultaneous checkpoint. It seems like just a preventive measure but it's no longer useful because we are sure that checkpoint is performed only by checkpointer except single process mode. --- src/backend/access/transam/xlog.c | 62 ++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 22 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 61cda56c6f..a03b8de593 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6921,6 +6921,9 @@ CreateRestartPoint(int flags) XLogSegNo _logSegNo; TimestampTz xtime; + /* we don't assume concurrent checkpoint/restartpoint to run */ + Assert (!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); + /* Get a local copy of the last safe checkpoint record. */ SpinLockAcquire(&XLogCtl->info_lck); lastCheckPointRecPtr = XLogCtl->lastCheckPointRecPtr; @@ -7007,36 +7010,34 @@ CreateRestartPoint(int flags) CheckPointGuts(lastCheckPoint.redo, flags); + /* Update pg_control */ + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); + /* * Remember the prior checkpoint's redo ptr for * UpdateCheckPointDistanceEstimate() */ PriorRedoPtr = ControlFile->checkPointCopy.redo; + Assert (PriorRedoPtr < RedoRecPtr); + + ControlFile->checkPoint = lastCheckPointRecPtr; + ControlFile->checkPointCopy = lastCheckPoint; + /* - * Update pg_control, using current time. Check that it still shows - * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do nothing; - * this is a quick hack to make sure nothing really bad happens if somehow - * we get here after the end-of-recovery checkpoint. + * Ensure minRecoveryPoint is past the checkpoint record while archive + * recovery is still ongoing. Normally, this will have happened already + * while writing out dirty buffers, but not necessarily - e.g. because no + * buffers were dirtied. We do this because a non-exclusive base backup + * uses minRecoveryPoint to determine which WAL files must be included in + * the backup, and the file (or files) containing the checkpoint record + * must be included, at a minimum. Note that for an ordinary restart of + * recovery there's no value in having the minimum recovery point any + * earlier than this anyway, because redo will begin just after the + * checkpoint record. */ - LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); - if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && - ControlFile->checkPointCopy.redo < lastCheckPoint.redo) + if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY) { - ControlFile->checkPoint = lastCheckPointRecPtr; - ControlFile->checkPointCopy = lastCheckPoint; - - /* - * Ensure minRecoveryPoint is past the checkpoint record. Normally, - * this will have happened already while writing out dirty buffers, - * but not necessarily - e.g. because no buffers were dirtied. We do - * this because a backup performed in recovery uses minRecoveryPoint to - * determine which WAL files must be included in the backup, and the - * file (or files) containing the checkpoint record must be included, - * at a minimum. Note that for an ordinary restart of recovery there's - * no value in having the minimum recovery point any earlier than this - * anyway, because redo will begin just after the checkpoint record. - */ if (ControlFile->minRecoveryPoint < lastCheckPointEndPtr) { ControlFile->minRecoveryPoint = lastCheckPointEndPtr; @@ -7048,8 +7049,25 @@ CreateRestartPoint(int flags) } if (flags & CHECKPOINT_IS_SHUTDOWN) ControlFile->state = DB_SHUTDOWNED_IN_RECOVERY; - UpdateControlFile(); } + else + { + /* recovery mode is not supposed to end during shutdown restartpoint */ + Assert((flags & CHECKPOINT_IS_SHUTDOWN) == 0); + + /* + * Aarchive recovery has ended. Crash recovery ever after should + * always recover to the end of WAL + */ + ControlFile->minRecoveryPoint = InvalidXLogRecPtr; + ControlFile->minRecoveryPointTLI = 0; + + /* also update local copy */ + LocalMinRecoveryPoint = InvalidXLogRecPtr; + LocalMinRecoveryPointTLI = 0; + } + + UpdateControlFile(); LWLockRelease(ControlFileLock); /* -- 2.27.0