At Wed, 3 Mar 2021 11:14:20 -0500, David Steele <da...@pgmasters.net> wrote in > Hi Kyotaro, > > On 3/27/20 10:25 PM, James Coleman wrote: > > On Thu, Mar 26, 2020 at 12:41 PM Robert Haas <robertmh...@gmail.com> > > wrote: > >> > >> I'm just spitballing here, but it would be really good if there's a > >> way to know definitely whether or not you should be scared. Corrupted > >> WAL segments are definitely a thing that happens, but retries are a > >> lot more common. > > First, I agree that getting enough context to say precisely is by far > > the ideal. > > That being said, as an end user who's found this surprising -- and > > momentarily scary every time I initially scan it even though I *know > > intellectually it's not* -- I would find Peter's suggestion a > > significant improvement over what we have now. I'm fairly certainly my > > co-workers on our database team would also. Knowing that something is > > at least not always scary is good. Though I'll grant that this does > > have the negative in reverse: if it actually is a scary > > situation...this mutes your concern level. On the other hand, > > monitoring would tell us if there's a real problem (namely replication > > lag), so I think the trade-off is clearly worth it. > > How about this minor tweak: > > HINT: This is expected if this is the end of currently available WAL. > > Otherwise, it could indicate corruption. > > Any thoughts on the suggestions for making the messaging clearer? > > Also, the patch no longer applies: > http://cfbot.cputube.org/patch_32_2490.log.
Sorry for missing the last discussions. I agree to the point about really-scary situation. ValidXLogRecordHeader deliberately marks End-Of-WAL only in the case of zero-length record so that the callers can identify that case, instead of inferring the EOW state without it. All other invalid data is treated as potentially danger situation. I think this is a reasonable classification. And the error level for the "danger" cases is changed to WARNING (from LOG). As the result, the following messages are emitted with the attached. - found zero-length record during recovery (the DETAIL might not be needed.) > LOG: redo starts at 0/14000118 > LOG: reached end of WAL at 0/14C5D070 on timeline 1 in pg_wal during crash > recovery > DETAIL: record length is 0 at 0/14C5D070 > LOG: redo done at 0/14C5CF48 system usage: ... - found another kind of invalid data > LOG: redo starts at 0/150000A0 > WARNING: invalid record length at 0/1500CA60: wanted 24, got 54 > LOG: redo done at 0/1500CA28 system usage: ... On the way checking the patch, I found that it emits the following log lines in the case the redo loop meets an invalid record at the starting: > LOG: invalid record length at 0/10000118: wanted 24, got 42 > LOG: redo is not required which doesn't look proper. That case is identifiable using the End-of_WAL flag this patch adds. Thus we get the following error messages. - found end-of-wal at the beginning of recovery > LOG: reached end of WAL at 0/130000A0 on timeline 1 in pg_wal during crash > recovery > DETAIL: record length is 0 at 0/130000A0 > LOG: redo is not required - found invalid data > WARNING: invalid record length at 0/120000A0: wanted 24, got 42 > WARNING: redo is skipped > HINT: This suggests WAL file corruption. You might need to check the > database. The logic of ErrRecPtr in ReadRecord may wrong. I remember having such an discussion before... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From f89b5c965d0a49de3c1297bb5edd2dc061951b71 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota....@gmail.com> Date: Fri, 28 Feb 2020 15:52:58 +0900 Subject: [PATCH v4] Make End-Of-Recovery error less scary When recovery in any type ends, we see a bit scary error message like "invalid record length" that suggests something serious is happening. Make this message less scary as "reached end of WAL". --- src/backend/access/transam/xlog.c | 81 ++++++++++++++++++------- src/backend/access/transam/xlogreader.c | 11 ++++ src/backend/replication/walreceiver.c | 13 ++-- src/include/access/xlogreader.h | 1 + 4 files changed, 79 insertions(+), 27 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 377afb8732..fbcb8d78b8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -4361,12 +4361,15 @@ ReadRecord(XLogReaderState *xlogreader, int emode, for (;;) { char *errormsg; + XLogRecPtr ErrRecPtr = InvalidXLogRecPtr; record = XLogReadRecord(xlogreader, &errormsg); ReadRecPtr = xlogreader->ReadRecPtr; EndRecPtr = xlogreader->EndRecPtr; if (record == NULL) { + ErrRecPtr = ReadRecPtr ? ReadRecPtr : EndRecPtr; + if (readFile >= 0) { close(readFile); @@ -4374,13 +4377,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode, } /* - * We only end up here without a message when XLogPageRead() - * failed - in that case we already logged something. In - * StandbyMode that only happens if we have been triggered, so we - * shouldn't loop anymore in that case. + * If we met other than end-of-wal, emit the error message right + * now. Otherwise the message if any is shown as a part of the + * end-of-WAL message below. */ - if (errormsg) - ereport(emode_for_corrupt_record(emode, EndRecPtr), + if (!xlogreader->EndOfWAL && errormsg) + ereport(emode_for_corrupt_record(emode, ErrRecPtr), (errmsg_internal("%s", errormsg) /* already translated */ )); } @@ -4411,11 +4413,12 @@ ReadRecord(XLogReaderState *xlogreader, int emode, /* Great, got a record */ return record; } - else + + /* No valid record available from this source */ + lastSourceFailed = true; + + if (!fetching_ckpt) { - /* No valid record available from this source */ - lastSourceFailed = true; - /* * If archive recovery was requested, but we were still doing * crash recovery, switch to archive recovery and retry using the @@ -4428,11 +4431,17 @@ ReadRecord(XLogReaderState *xlogreader, int emode, * we'd have no idea how far we'd have to replay to reach * consistency. So err on the safe side and give up. */ - if (!InArchiveRecovery && ArchiveRecoveryRequested && - !fetching_ckpt) + if (!InArchiveRecovery && ArchiveRecoveryRequested) { + /* + * We don't report this as LOG, since we don't stop recovery + * here + */ ereport(DEBUG1, - (errmsg_internal("reached end of WAL in pg_wal, entering archive recovery"))); + (errmsg_internal("reached end of WAL at %X/%X on timeline %u in %s during crash recovery, entering archive recovery", + LSN_FORMAT_ARGS(ErrRecPtr), + ThisTimeLineID, + xlogSourceNames[currentSource]))); InArchiveRecovery = true; if (StandbyModeRequested) StandbyMode = true; @@ -4480,12 +4489,33 @@ ReadRecord(XLogReaderState *xlogreader, int emode, continue; } - /* In standby mode, loop back to retry. Otherwise, give up. */ - if (StandbyMode && !CheckForStandbyTrigger()) - continue; - else - return NULL; + /* + * We reached the end of WAL, show the messages just once at the + * same LSN. + */ + if (emode_for_corrupt_record(LOG, ErrRecPtr) == LOG) + { + char *fmt; + + if (StandbyMode) + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during standby mode"); + else if (InArchiveRecovery) + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during archive recovery"); + else + fmt = gettext_noop("reached end of WAL at %X/%X on timeline %u in %s during crash recovery"); + + ereport(LOG, + (errmsg(fmt, LSN_FORMAT_ARGS(EndRecPtr), ThisTimeLineID, + xlogSourceNames[currentSource]), + (errormsg ? errdetail_internal("%s", errormsg) : 0))); + } } + + /* In standby mode, loop back to retry. Otherwise, give up. */ + if (StandbyMode && !CheckForStandbyTrigger()) + continue; + else + return NULL; } } @@ -7227,7 +7257,7 @@ StartupXLOG(void) else { /* just have to read next record after CheckPoint */ - record = ReadRecord(xlogreader, LOG, false); + record = ReadRecord(xlogreader, WARNING, false); } if (record != NULL) @@ -7454,7 +7484,7 @@ StartupXLOG(void) } /* Else, try to fetch the next WAL record */ - record = ReadRecord(xlogreader, LOG, false); + record = ReadRecord(xlogreader, WARNING, false); } while (record != NULL); /* @@ -7514,13 +7544,20 @@ StartupXLOG(void) InRedo = false; } - else + else if (xlogreader->EndOfWAL) { /* there are no WAL records following the checkpoint */ ereport(LOG, (errmsg("redo is not required"))); } + else + { + /* broken record found */ + ereport(WARNING, + (errmsg("redo is skipped"), + errhint("This suggests WAL file corruption. You might need to check the database."))); + } /* * This check is intentionally after the above log messages that @@ -12653,7 +12690,7 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr) { static XLogRecPtr lastComplaint = 0; - if (readSource == XLOG_FROM_PG_WAL && emode == LOG) + if (readSource == XLOG_FROM_PG_WAL && emode <= WARNING) { if (RecPtr == lastComplaint) emode = DEBUG1; diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 42738eb940..dacba32143 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -118,6 +118,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir, pfree(state); return NULL; } + state->EndOfWAL = false; state->errormsg_buf[0] = '\0'; /* @@ -288,6 +289,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg) /* reset error state */ *errormsg = NULL; state->errormsg_buf[0] = '\0'; + state->EndOfWAL = false; ResetDecoder(state); @@ -689,6 +691,15 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr, XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess) { + if (record->xl_tot_len == 0) + { + /* This is strictly not an invalid state, so phrase it as so. */ + report_invalid_record(state, + "record length is 0 at %X/%X", + (uint32) (RecPtr >> 32), (uint32) RecPtr); + state->EndOfWAL = true; + return false; + } if (record->xl_tot_len < SizeOfXLogRecord) { report_invalid_record(state, diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c index e5f8a06fea..2377c58b4c 100644 --- a/src/backend/replication/walreceiver.c +++ b/src/backend/replication/walreceiver.c @@ -460,12 +460,15 @@ WalReceiverMain(void) else if (len < 0) { ereport(LOG, - (errmsg("replication terminated by primary server"), - errdetail("End of WAL reached on timeline %u at %X/%X.", + (errmsg("replication terminated by primary server on timeline %u at %X/%X.", startpointTLI, - LSN_FORMAT_ARGS(LogstreamResult.Write)))); - endofwal = true; - break; + LSN_FORMAT_ARGS(LogstreamResult.Write)))); + + /* + * we have no longer anything to do on the broken + * connection other than exiting. + */ + proc_exit(1); } len = walrcv_receive(wrconn, &buf, &wait_fd); } diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index 21d200d3df..0491adfc5b 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -174,6 +174,7 @@ struct XLogReaderState */ XLogRecPtr ReadRecPtr; /* start of last record read */ XLogRecPtr EndRecPtr; /* end+1 of last record read */ + bool EndOfWAL; /* the last attempt was EOW? */ /* ---------------------------------------- -- 2.27.0