Hello. At Thu, 24 May 2018 16:57:07 +0900, Michael Paquier <mich...@paquier.xyz> wrote in <20180524075707.ge15...@paquier.xyz> > On Mon, May 14, 2018 at 01:14:22PM +0530, Pavan Deolasee wrote: > > Looks like I didn't understand Alvaro's comment when he mentioned it to me > > off-list. But I now see what Michael and Alvaro mean and that indeed seems > > like a problem. I was thinking that the test for (ControlFile->state == > > DB_IN_ARCHIVE_RECOVERY) will ensure that minRecoveryPoint can't be updated > > after the standby is promoted. While that's true for a DB_IN_PRODUCTION, > > the > > RestartPoint may finish after we have written end-of-recovery record, but > > before we're in production and thus the minRecoveryPoint may again be set. > > Yeah, this has been something I considered as well first, but I was not > confident enough that setting up minRecoveryPoint to InvalidXLogRecPtr > was actually a safe thing for timeline switches. > > So I have spent a good portion of today testing and playing with it to > be confident enough that this was right, and I have finished with the > attached. The patch adds a new flag to XLogCtl which marks if the > control file has been updated after the end-of-recovery record has been > written, so as minRecoveryPoint does not get updated because of a > restart point running in parallel. > > I have also reworked the test case you sent, removing the manuals sleeps > and replacing them with correct wait points. There is also no point to > wait after promotion as pg_ctl promote implies a wait. Another > important thing is that you need to use wal_log_hints = off to see a > crash, which is something that allows_streaming actually enables. > > Comments are welcome.
As the result of some poking around, my dignosis is different from yours. (I believe that) By definition recovery doesn't end until the end-of-recovery check point ends so from the viewpoint I think it is wrong to clear ControlFile->minRecoveryPoint before the end. Invalid-page checking during crash recovery is hamful rather than useless. It is done by CheckRecoveryConsistency even in crash recovery against its expectation because there's a case where minRecoveryPoint is valid but InArchiveRecovery is false. The two variable there seems to be in contradiction with each other. The immediate cause of the contradition is that StartXLOG wrongly initializes local minRecoveryPoint from control file's value even under crash recovery. updateMinRecoveryPoint also should be turned off during crash recovery. The case of crash after last checkpoint end has been treated in UpdateMinRecoveryPoint but forgot to consider this case, where crash recovery has been started while control file is still holding valid minRecoveryPoint. Finally, the attached patch seems fixing the issue. It passes 015_promotion.pl and the problem doesn't happen with 014_promotion_bug.pl. Also this passes the existing tests 002-014. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 192c36bc96135d9108c499d01016f7eb6fc28fd1 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp> Date: Thu, 7 Jun 2018 16:02:41 +0900 Subject: [PATCH] Fix wrong behavior during crash recovery After being killed before end-of-recovery checkpoint ends, server restarts in crash recovery mode. However it can wrongly performs invalid-page checks of WAL records, which is not expected during crash recovery, then results in PANIC. This patch prevents the wrong checking by suppressing needless updates of minRecoveryPoint during crash recovery. --- src/backend/access/transam/xlog.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index adbd6a2126..283c26cb6c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -821,8 +821,11 @@ static XLogSource XLogReceiptSource = 0; /* XLOG_FROM_* code */ static XLogRecPtr ReadRecPtr; /* start of last record read */ static XLogRecPtr EndRecPtr; /* end+1 of last record read */ -static XLogRecPtr minRecoveryPoint; /* local copy of - * ControlFile->minRecoveryPoint */ +/* + * local copy of ControlFile->minRecoveryPoint. InvalidXLogRecPtr means we're + * in crash recovery + */ +static XLogRecPtr minRecoveryPoint = InvalidXLogRecPtr; static TimeLineID minRecoveryPointTLI; static bool updateMinRecoveryPoint = true; @@ -2717,14 +2720,7 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) minRecoveryPoint = ControlFile->minRecoveryPoint; minRecoveryPointTLI = ControlFile->minRecoveryPointTLI; - /* - * An invalid minRecoveryPoint means that we need to recover all the WAL, - * i.e., we're doing crash recovery. We never modify the control file's - * value in that case, so we can short-circuit future checks here too. - */ - if (minRecoveryPoint == 0) - updateMinRecoveryPoint = false; - else if (force || minRecoveryPoint < lsn) + if (force || minRecoveryPoint < lsn) { XLogRecPtr newMinRecoveryPoint; TimeLineID newMinRecoveryPointTLI; @@ -6841,6 +6837,7 @@ StartupXLOG(void) ControlFile->checkPointCopy.ThisTimeLineID, recoveryTargetTLI))); ControlFile->state = DB_IN_CRASH_RECOVERY; + updateMinRecoveryPoint = false; } ControlFile->checkPoint = checkPointLoc; ControlFile->checkPointCopy = checkPoint; @@ -6852,6 +6849,10 @@ StartupXLOG(void) ControlFile->minRecoveryPoint = checkPoint.redo; ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID; } + + /* also initialize our local copies */ + minRecoveryPoint = ControlFile->minRecoveryPoint; + minRecoveryPointTLI = ControlFile->minRecoveryPointTLI; } /* @@ -6889,10 +6890,6 @@ StartupXLOG(void) /* No need to hold ControlFileLock yet, we aren't up far enough */ UpdateControlFile(); - /* initialize our local copy of minRecoveryPoint */ - minRecoveryPoint = ControlFile->minRecoveryPoint; - minRecoveryPointTLI = ControlFile->minRecoveryPointTLI; - /* * Reset pgstat data, because it may be invalid after recovery. */ @@ -7858,6 +7855,8 @@ CheckRecoveryConsistency(void) if (XLogRecPtrIsInvalid(minRecoveryPoint)) return; + Assert(InArchiveRecovery); + /* * assume that we are called in the startup process, and hence don't need * a lock to read lastReplayedEndRecPtr -- 2.16.3