On Thu, Jan 27, 2022 at 02:06:40PM +0900, Michael Paquier wrote:
> So, I have been checking this idea in details, and spotted what looks
> like one issue in CreateRestartPoint(), as of:
>       /*
>        * Update pg_control, using current time.  Check that it still shows
>        * DB_IN_ARCHIVE_RECOVERY state and an older checkpoint, else do 
> nothing;
>        * this is a quick hack to make sure nothing really bad happens if 
> somehow
>        * we get here after the end-of-recovery checkpoint.
>        */
>       LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>       if (ControlFile->state == DB_IN_ARCHIVE_RECOVERY &&
>               ControlFile->checkPointCopy.redo < lastCheckPoint.redo)
> 
> This change increases the window making this code path reachable if an
> end-of-recovery checkpoint is triggered but not finished at the end of
> recovery (possible of course at the end of crash recovery, but
> DB_IN_ARCHIVE_RECOVERY is also possible when !IsUnderPostmaster with a
> promotion request), before updating ControlFile->checkPointCopy at the
> end of the checkpoint because the state could still be
> DB_IN_ARCHIVE_RECOVERY.  The window is wider the longer the
> end-of-recovery checkpoint.  And this would be the case of an instance
> restarted, when a restart point is created.

I wonder if this is actually a problem in practice.  IIUC all of the values
updated in this block should be reset at the end of the end-of-recovery
checkpoint.  Is the intent of the quick hack to prevent those updates after
an end-of-recovery checkpoint completes, or is it trying to block them
after one begins?  It looks like the control file was being updated to
DB_SHUTDOWNING at the beginning of end-of-recovery checkpoints when that
change was first introduced (2de48a8), so I agree that we'd better be
careful with this change.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com/


Reply via email to