On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote: > So, I have been checking this idea in details, and spotted what looks > like one issue in CreateRestartPoint(), as of: > /* > * Update pg_control, using current time. Check that it still shows > * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do > nothing; > * this is a quick hack to make sure nothing really bad happens if > somehow > * we get here after the end-of-recovery checkpoint. > */ > LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY && > ControlFile->checkPointCopy.redo < lastCheckPoint.redo) > > This change increases the window making this code path reachable if an > end-of-recovery checkpoint is triggered but not finished at the end of > recovery (possible of course at the end of crash recovery, but > DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a > promotion request), before updating ControlFile->checkPointCopy at the > end of the checkpoint because the state could still be > DB_IN_ARCHIVE_RECOVERY. The window is wider the longer the > end-of-recovery checkpoint. And this would be the case of an instance > restarted, when a restart point is created.
I wonder if this is actually a problem in practice. IIUC all of the values updated in this block should be reset at the end of the end-of-recovery checkpoint. Is the intent of the quick hack to prevent those updates after an end-of-recovery checkpoint completes, or is it trying to block them after one begins? It looks like the control file was being updated to DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that change was first introduced (2de48a8), so I agree that we'd better be careful with this change. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com/