At Mon, 20 Jun 2022 11:57:20 -0400, Robert Haas <robertmh...@gmail.com> wrote in > It seems to me that what we want to do is: if we're about to start > allowing WAL writes, then consider whether to insert an aborted > contrecord record. Now, if we are about to start allowing WAL write, > we must determine the LSN at which the first such record will be > written. Then there are two possibilities: either that LSN is on an > existing record boundary, or it isn't. In the former case, no aborted > contrecord record is required. In the latter case, we're writing at > LSN which would have been in the middle of a previous record, except > that the record in question was never finished or at least we don't > have all of it. So the first WAL we insert at our chosen starting LSN > must be an aborted contrecord record.
Right. > I'm not quite sure I understand the nature of the remaining bug here, > but this logic seems quite sketchy to me: > > /* > * When not in standby mode we find that WAL ends in an incomplete > * record, keep track of that record. After recovery is done, > * we'll write a record to indicate to downstream WAL readers that > * that portion is to be ignored. > */ > if (!StandbyMode && > !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr)) > { > abortedRecPtr = xlogreader->abortedRecPtr; > missingContrecPtr = xlogreader->missingContrecPtr; > } > > I don't see what StandbyMode has to do with anything here. The > question is whether the place where we're going to begin writing WAL > is on an existing record boundary or not. If it so happens that when > StandbyMode is turned on we can never decide to promote in the middle > of a record, then maybe there's no need to track this information when > StandbyMode = true, but I don't see a good reason why we should make > that assumption. What if in the future somebody added a command that Right. I came to the same conclusion before reading this section. It is rearer than other cases but surely possible. > says "don't keep trying to read more WAL, just promote RIGHT HERE?". I > think this logic would surely be incorrect in that case. It feels to > me like the right thing to do is to always keep track if WAL ends in > an incomplete record, and then when we promote, we write an aborted > contrecord record if WAL ended in an incomplete record, full stop. Agreed. Actually, with the second patch applied, removing !StandbyMode from the condition makes no difference in behavior. > Now, we do need to keep in mind that, while in StandbyMode, we might > reach the end of WAL more than once, because we have a polling loop > that looks for more WAL. So it does not work to just set the values > once and then assume that's the whole truth forever. But why not > handle that by storing the abortedRecPtr and missingContrecPtr > unconditionally after every call to XLogPrefetcherReadRecord()? They > will go back and forth between XLogRecPtrIsInvalid and other values > many times but the last value should -- I think -- be however things > ended up at the point where we decided to stop reading WAL. Unfortunately it doesn't work because we read a record already known to be complete again at the end of recovery. It is the reason of "abortedRecPtr < xlogreader->EndRecPtr" in my PoC patch. Without it, abrotedRecPtr is erased when it is actually needed. I don't like that expedient-looking condition, but the strict condition for resetting abortedRecPtr is iff "we have read a complete record at the same or grater LSN ofabortedRecPtr"... Come to think of this, I noticed that we can get rid of the file-global abortedRecPtr by letting FinishWalRecovery copy abortedRecPtr *before* reading the last record. This also allows us to remove the "expedient" condition. > I haven't really dived into this issue much so I might be totally off > base, but it just doesn't feel right to me that this should depend on > whether we're in standby mode. That seems at best incidentally related > to whether an aborted contrecord record is required. > > P.S. "aborted contrecord record" is really quite an awkward turn of phrase! Thats true! Always open for better phrasings:( regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 6eba626420..c1be15039d 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -360,15 +360,6 @@ typedef struct XLogRecoveryCtlData static XLogRecoveryCtlData *XLogRecoveryCtl = NULL; -/* - * abortedRecPtr is the start pointer of a broken record at end of WAL when - * recovery completes; missingContrecPtr is the location of the first - * contrecord that went missing. See CreateOverwriteContrecordRecord for - * details. - */ -static XLogRecPtr abortedRecPtr; -static XLogRecPtr missingContrecPtr; - /* * if recoveryStopsBefore/After returns true, it saves information of the stop * point here @@ -944,12 +935,6 @@ InitWalRecovery(ControlFileData *ControlFile, bool *wasShutdown_ptr, minRecoveryPointTLI = 0; } - /* - * Start recovery assuming that the final record isn't lost. - */ - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - *wasShutdown_ptr = wasShutdown; *haveBackupLabel_ptr = haveBackupLabel; *haveTblspcMap_ptr = haveTblspcMap; @@ -1430,6 +1415,14 @@ FinishWalRecovery(void) lastRec = XLogRecoveryCtl->lastReplayedReadRecPtr; lastRecTLI = XLogRecoveryCtl->lastReplayedTLI; } + + /* + * record aborted contrecord status before the next ReadRecord erases the + * state + */ + result->abortedRecPtr = xlogreader->abortedRecPtr; + result->missingContrecPtr = xlogreader->missingContrecPtr; + XLogPrefetcherBeginRead(xlogprefetcher, lastRec); (void) ReadRecord(xlogprefetcher, PANIC, false, lastRecTLI); endOfLog = xlogreader->EndRecPtr; @@ -1503,9 +1496,6 @@ FinishWalRecovery(void) result->lastRecTLI = lastRecTLI; result->endOfLog = endOfLog; - result->abortedRecPtr = abortedRecPtr; - result->missingContrecPtr = missingContrecPtr; - result->standby_signal_file_found = standby_signal_file_found; result->recovery_signal_file_found = recovery_signal_file_found; @@ -1970,10 +1960,6 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) LSN_FORMAT_ARGS(xlrec.overwritten_lsn), LSN_FORMAT_ARGS(record->overwrittenRecPtr)); - /* We have safely skipped the aborted record */ - abortedRecPtr = InvalidXLogRecPtr; - missingContrecPtr = InvalidXLogRecPtr; - ereport(LOG, (errmsg("successfully skipped missing contrecord at %X/%X, overwritten at %s", LSN_FORMAT_ARGS(xlrec.overwritten_lsn), @@ -2971,19 +2957,6 @@ ReadRecord(XLogPrefetcher *xlogprefetcher, int emode, record = XLogPrefetcherReadRecord(xlogprefetcher, &errormsg); if (record == NULL) { - /* - * When not in standby mode we find that WAL ends in an incomplete - * record, keep track of that record. After recovery is done, - * we'll write a record to indicate to downstream WAL readers that - * that portion is to be ignored. - */ - if (!StandbyMode && - !XLogRecPtrIsInvalid(xlogreader->abortedRecPtr)) - { - abortedRecPtr = xlogreader->abortedRecPtr; - missingContrecPtr = xlogreader->missingContrecPtr; - } - if (readFile >= 0) { close(readFile);