Thank you for the comments! At Tue, 9 Nov 2021 09:53:15 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > On Mon, Nov 08, 2021 at 02:59:46PM +0900, Kyotaro Horiguchi wrote: > > While checking the behavior for the case of missing-contrecord, I > > noticed that emode_for_corrupt_record() doesn't work as expected since > > readSource is reset to XLOG_FROM_ANY after a read failure. We could > > remember the last failed source but pg_wal should have been visited if > > page read error happened so I changed the function so that it treats > > XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL. > > FWIW, I am not much a fan of assuming that it is fine to use > XLOG_FROM_ANY as a condition here. The comments on top of > emode_for_corrupt_record() make it rather clear what the expectations > are, and this is the default readSource.
The readSource is expected by the function to be the failed source but it goes back to XLOG_FROM_ANY on page read failure. So the function *is* standing on the wrong assumption. I noticed that currentSource holds the last accessed source (but forgot about that). So it is exactly what we need here. No longer need to introduce the unclear assumption by using it. > > (Otherwise we see "LOG: reached end-of-WAL at .." message after > > "WARNING: missing contrecord at.." message.) > > + /* broken record found */ > + ereport(WARNING, > + (errmsg("redo is skipped"), > + errhint("This suggests WAL file corruption. You might > need to check the database."))); > This looks rather scary to me, FWIW, and this could easily be reached Yes, the message is intentionally scary, since we don't come here in the case of clean WAL:) > if one forgets about EndOfWAL while hacking on xlogreader.c. > Unlikely so, still. I don't understand. Isn't it the case of almost every feature? The patch compells hackers to maintain the condition for recovery being considered cleanly ended. If the last record doesn't meet the condition, the WAL file should be considered having a problem. However, I don't see the condition expanded to have another term in future. Even if someone including myself broke that condition, we will at worst unwantedly see a scary message. And I believe almost all hackers can easily find it a bug from the DETAILED message shown along aside. I'm not sure such bugs could be found in development phase, though.. > + report_invalid_record(state, > + "missing contrecord at %X/%X", > + LSN_FORMAT_ARGS(RecPtr)); > Isn't there a risk here to break applications checking after error > messages stored in the WAL reader after seeing a contrecord? I'm not sure you are mentioning the case where no message is stored previously, or the case where already a message is stored. The former is fine as the record is actually broken. But I was missing the latter case. In this version I avoided to overwite the error message. > + if (record->xl_tot_len == 0) > + { > + /* This is strictly not an invalid state, so phrase it as so. */ > + report_invalid_record(state, > + "record length is 0 at %X/%X", > + LSN_FORMAT_ARGS(RecPtr)); > + state->EndOfWAL = true; > + return false; > + } > This assumes that a value of 0 for xl_tot_len is a synonym of the end > of WAL, but cannot we have also a corrupted record in this case in the > shape of xl_tot_len being 0? We validate the full record after > reading the header, so it seems to me that we should not assume that > things are just ending as proposed in this patch. Yeah, it's the most serious concern to me. So I didn't hide the detailed message in the "end-of-wal reached message". > LOG: reached end of WAL at 0/512F198 on timeline 1 in pg_wal during crash > recovery > DETAIL: record length is 0 at 0/512F210 I believe everyone regards zero record length as fine unless something wrong is seen afterwards. However, we can extend the check to the whole record header. I think it is by far nearer to the perfect for almost all cases. The attached emits the following message for the good (true end-of-WAL) case. > LOG: reached end of WAL at 0/512F4A0 on timeline 1 in pg_wal during crash > recovery > DETAIL: empty record header found at 0/512F518 If garbage bytes are found in the header area, the following log will be left. I think we can have a better message here. > WARNING: garbage record header at 0/2095458 > LOG: redo done at 0/2095430 system usage: CPU: user: 0.03 s, system: 0.01 s, > elapsed: 0.04 s This is the updated version. - emode_for_currupt_record() now uses currentSource instead of readSource. - If zero record length is faced, make sure the whole header is zeroed before deciding it as the end-of-WAL. - Do not overwrite existig message when missing contrecord is detected. The message added here is seen in the TAP test log 026_overwrite_contrecord_standby.log regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From 1d5f6e707f8d67172eea79689c8a5f4d86889d3e Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 28 Feb 2020 15:52:58 +0900 Subject: [PATCH v6] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Actually if recovery meets a record with length = 0, that usually means it finished applying all available WAL records. Make this message less scary as "reached end of WAL". Instead raise the error level for other kind of WAL failure to WARNING. --- src/backend/access/transam/xlog.c | 89 +++++++++++++++++++------ src/backend/access/transam/xlogreader.c | 42 ++++++++++++ src/backend/replication/walreceiver.c | 3 +- src/include/access/xlogreader.h | 1 + 4 files changed, 111 insertions(+), 24 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5cda30836f..e90c69810b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4477,6 +4477,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode, for (;;) { char *errormsg; + XLogRecPtr ErrRecPtr = InvalidXLogRecPtr; record = XLogReadRecord(xlogreader, &errormsg); ReadRecPtr = xlogreader->ReadRecPtr; @@ -4494,6 +4495,16 @@ ReadRecord(XLogReaderState *xlogreader, int emode, { abortedRecPtr = xlogreader->abortedRecPtr; missingContrecPtr = xlogreader->missingContrecPtr; + ErrRecPtr = abortedRecPtr; + } + else + { + /* + * NULL ReadRecPtr means we could not read a record at + * beginning. In that case EndRecPtr is storing the LSN of the + * record we tried to read. + */ + ErrRecPtr = ReadRecPtr ? ReadRecPtr : EndRecPtr; } if (readFile >= 0) @@ -4503,13 +4514,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode, } /* - * We only end up here without a message when XLogPageRead() - * failed - in that case we already logged something. In - * StandbyMode that only happens if we have been triggered, so we - * shouldn't loop anymore in that case. + * If we get here for other than end-of-wal, emit the error message + * right now. Otherwise the message if any is shown as a part of + * the end-of-WAL message below. */ - if (errormsg) - ereport(emode_for_corrupt_record(emode, EndRecPtr), + if (!xlogreader->EndOfWAL && errormsg) + ereport(emode_for_corrupt_record(emode, ErrRecPtr), (errmsg_internal("%s", errormsg) /* already translated */ )); } @@ -4540,11 +4550,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode, /* Great, got a record */ return record; } - else + + /* No valid record available from this source */ + lastSourceFailed = true; + + if (!fetching_ckpt) { - /* No valid record available from this source */ - lastSourceFailed = true; - /* * If archive recovery was requested, but we were still doing * crash recovery, switch to archive recovery and retry using the @@ -4557,11 +4568,17 @@ ReadRecord(XLogReaderState *xlogreader, int emode, * we'd have no idea how far we'd have to replay to reach * consistency. So err on the safe side and give up. */ - if (!InArchiveRecovery && ArchiveRecoveryRequested && - !fetching_ckpt) + if (!InArchiveRecovery && ArchiveRecoveryRequested) { + /* + * We don't report this as LOG, since we don't stop recovery + * here + */ ereport(DEBUG1, - (errmsg_internal("reached end of WAL in pg_wal, entering archive recovery"))); + (errmsg_internal("reached end of WAL at %X/%X on timeline %u in %s during crash recovery, entering archive recovery", + LSN_FORMAT_ARGS(ErrRecPtr), + replayTLI, + xlogSourceNames[currentSource]))); InArchiveRecovery = true; if (StandbyModeRequested) StandbyMode = true; @@ -4609,12 +4626,33 @@ ReadRecord(XLogReaderState *xlogreader, int emode, continue; } - /* In standby mode, loop back to retry. Otherwise, give up. */ - if (StandbyMode && !CheckForStandbyTrigger()) - continue; - else - return NULL; + /* + * If we haven't emit an error message, we have safely reached the + * end-of-WAL. + */ + if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG) + { + char *fmt; + + if (StandbyMode) + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during standby mode"); + else if (InArchiveRecovery) + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during archive recovery"); + else + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during crash recovery"); + + ereport(LOG, + (errmsg(fmt, LSN_FORMAT_ARGS(ErrRecPtr), replayTLI, + xlogSourceNames[currentSource]), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); + } } + + /* In standby mode, loop back to retry. Otherwise, give up. */ + if (StandbyMode && !CheckForStandbyTrigger()) + continue; + else + return NULL; } } @@ -7544,7 +7582,7 @@ StartupXLOG(void) else { /* just have to read next record after CheckPoint */ - record = ReadRecord(xlogreader, LOG, false, ThisTimeLineID); + record = ReadRecord(xlogreader, WARNING, false, ThisTimeLineID); } if (record != NULL) @@ -7781,7 +7819,7 @@ StartupXLOG(void) } /* Else, try to fetch the next WAL record */ - record = ReadRecord(xlogreader, LOG, false, ThisTimeLineID); + record = ReadRecord(xlogreader, WARNING, false, ThisTimeLineID); } while (record != NULL); /* @@ -7841,13 +7879,20 @@ StartupXLOG(void) InRedo = false; } - else + else if (xlogreader->EndOfWAL) { /* there are no WAL records following the checkpoint */ ereport(LOG, (errmsg("redo is not required"))); } + else + { + /* broken record found */ + ereport(WARNING, + (errmsg("redo is skipped"), + errhint("This suggests WAL file corruption. You might need to check the database."))); + } /* * This check is intentionally after the above log messages that @@ -13147,7 +13192,7 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr) { static XLogRecPtr lastComplaint = 0; - if (readSource == XLOG_FROM_PG_WAL && emode == LOG) + if (currentSource == XLOG_FROM_PG_WAL && emode <= WARNING) { if (RecPtr == lastComplaint) emode = DEBUG1; diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index f39f8044a9..273b927cd9 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -121,6 +121,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir, pfree(state); return NULL; } + state->EndOfWAL = false; state->errormsg_buf[0] = '\0'; /* @@ -292,6 +293,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg) /* reset error state */ *errormsg = NULL; state->errormsg_buf[0] = '\0'; + state->EndOfWAL = false; ResetDecoder(state); state->abortedRecPtr = InvalidXLogRecPtr; @@ -588,6 +590,16 @@ err: */ state->abortedRecPtr = RecPtr; state->missingContrecPtr = targetPagePtr; + + /* + * If the messages is not set yet, that means we failed to load the + * page for the record. Otherwise do not hide the existing message at + * it should be more detailed. + */ + if (state->errormsg_buf[0] == '\0') + report_invalid_record(state, + "missing contrecord at %X/%X", + LSN_FORMAT_ARGS(RecPtr)); } /* @@ -730,6 +742,36 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess) { + if (record->xl_tot_len == 0) + { + /* + * We are almost sure reaching the end of WAL, make sure that the whole + * header is zeroed. + */ + char *p = (char *)record; + char *pe = (char *)record + SizeOfXLogRecord; + + while (*p == 0 && p < pe) + p++; + + if (p == pe) + { + /* it is completely zeroed, call it a day */ + report_invalid_record(state, "empty record header found at %X/%X", + LSN_FORMAT_ARGS(RecPtr)); + + /* notify end-of-wal to callers */ + state->EndOfWAL = true; + } + else + { + /* Otherwise we found a garbage header.. */ + report_invalid_record(state, "garbage record header at %X/%X", + LSN_FORMAT_ARGS(RecPtr)); + } + + return false; + } if (record->xl_tot_len < SizeOfXLogRecord) { report_invalid_record(state, diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index 7a7eb3784e..ba3c4bd550 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -471,8 +471,7 @@ WalReceiverMain(void) else if (len < 0) { ereport(LOG, - (errmsg("replication terminated by primary server"), - errdetail("End of WAL reached on timeline %u at %X/%X.", + (errmsg("replication terminated by primary server on timeline %u at %X/%X.", startpointTLI, LSN_FORMAT_ARGS(LogstreamResult.Write)))); endofwal = true; diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index de6fd791fe..1241b85838 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -174,6 +174,7 @@ struct XLogReaderState */ XLogRecPtr ReadRecPtr; /* start of last record read */ XLogRecPtr EndRecPtr; /* end+1 of last record read */ + bool EndOfWAL; /* the last attempt was EOW? */ /* * Set at the end of recovery: the start point of a partial record at the -- 2.27.0