Thanks! At Wed, 16 Jun 2021 16:52:11 +0900, Masahiko Sawada <sawada.m...@gmail.com> wrote in > On Fri, Jun 4, 2021 at 5:35 PM Kyotaro Horiguchi > <horikyota....@gmail.com> wrote: > > > > In a very common operation of accidentally specifying a recycled > > segment, pg_waldump often returns the following obscure message. > > > > $ pg_waldump 00000001000000000000002D > > pg_waldump: fatal: could not find a valid record after 0/2D000000 > > > > The more detailed message is generated internally and we can use it. > > That looks like the following. > > > > $ pg_waldump 00000001000000000000002D > > pg_waldump: fatal: unexpected pageaddr 0/24000000 in log segment > > 00000001000000000000002D, offset 0 > > > > Is it work doing? > > Perhaps we need both? The current message describes where the error > happened and the message internally generated describes the details. > It seems to me that both are useful. For example, if we find an error > during XLogReadRecord(), we show both as follows: > > if (errormsg) > fatal_error("error in WAL record at %X/%X: %s", > LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr), > errormsg);
Yeah, I thought that it might be a bit vervous and lengty but actually we have another place where doing that. One more point is whether we have a case where first_record is invalid but errormsg is NULL there. WALDumpReadPage immediately exits so we should always have a message in that case according to the comment in ReadRecord. > * We only end up here without a message when XLogPageRead() > * failed - in that case we already logged something. In > * StandbyMode that only happens if we have been triggered, so we > * shouldn't loop anymore in that case. So that can be an assertion. Now the messages looks like this. $ pg_waldump /home/horiguti/data/data_work/pg_wal/000000020000000000000010 pg_waldump: fatal: could not find a valid record after 0/0: unexpected pageaddr 0/9000000 in log segment 000000020000000000000010, offset 0 reagards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 42738eb940..5a98c523c6 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -932,17 +932,28 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr, * * This positions the reader, like XLogBeginRead(), so that the next call to * XLogReadRecord() will read the next valid record. + * + * If the page_read callback fails to read the requested data, + * InvalidXLogRecPtr is returned. The callback is expected to have reported + * the error; errormsg is set to NULL. + * + * If the reading fails for some other reason, InvalidXLogRecPtr is also + * returned, and *errormsg is set to a string with details of the failure. + * + * The returned pointer (or *errormsg) points to an internal buffer that's + * valid until the next call to XLogFindNextRecord or XLogReadRecord. */ XLogRecPtr -XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr) +XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr, char **errormsg) { XLogRecPtr tmpRecPtr; XLogRecPtr found = InvalidXLogRecPtr; XLogPageHeader header; - char *errormsg; Assert(!XLogRecPtrIsInvalid(RecPtr)); + *errormsg = NULL; + /* * skip over potential continuation data, keeping in mind that it may span * multiple pages @@ -1021,7 +1032,7 @@ XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr) * or we just jumped over the remaining data of a continuation. */ XLogBeginRead(state, tmpRecPtr); - while (XLogReadRecord(state, &errormsg) != NULL) + while (XLogReadRecord(state, errormsg) != NULL) { /* past the record we've found, break out */ if (RecPtr <= state->ReadRecPtr) @@ -1036,6 +1047,9 @@ XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr) err: XLogReaderInvalReadState(state); + if (state->errormsg_buf[0] != '\0') + *errormsg = state->errormsg_buf; + return InvalidXLogRecPtr; } diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index f8b8afe4a7..0dae1ebea7 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -1051,11 +1051,17 @@ main(int argc, char **argv) fatal_error("out of memory"); /* first find a valid recptr to start from */ - first_record = XLogFindNextRecord(xlogreader_state, private.startptr); + first_record = XLogFindNextRecord(xlogreader_state, private.startptr, + &errormsg); if (first_record == InvalidXLogRecPtr) - fatal_error("could not find a valid record after %X/%X", - LSN_FORMAT_ARGS(private.startptr)); + { + /* XLogFindNextRecord must return errormsg on error */ + Assert (errormsg); + fatal_error("could not find a valid record after %X/%X: %s", + LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr), + errormsg); + } /* * Display a message that we're skipping data if `from` wasn't a pointer diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h index 21d200d3df..0f6d920fed 100644 --- a/src/include/access/xlogreader.h +++ b/src/include/access/xlogreader.h @@ -267,7 +267,8 @@ extern void XLogReaderFree(XLogReaderState *state); /* Position the XLogReader to given record */ extern void XLogBeginRead(XLogReaderState *state, XLogRecPtr RecPtr); #ifdef FRONTEND -extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr); +extern XLogRecPtr XLogFindNextRecord(XLogReaderState *state, XLogRecPtr RecPtr, + char **errormsg); #endif /* FRONTEND */ /* Read the next XLog record. Returns NULL on end-of-WAL or failure */