On 2021-Jul-23, Andrey Borodin wrote: > Hi! > > From time to time I observe $subj on clusters using logical replication. > I most of cases there are a lot of other errors. Probably $subj condition > should be kind of impossible without other problems. > I propose to enhance error logging of XLogReadRecord() in ReadPageInternal().
Hmm. A small problem in this patch is that XLogReaderValidatePageHeader already sets errormsg_buf; you're overwriting that. I suggest to leave that untouched. There are other two cases where the problem occurs in page_read() callback; ReadPageInternal explicitly documents that it doesn't set the error in that case. We have two options to deal with that: 1. change all existing callbacks to set the errormsg_buf depending on what actually fails, and then if they return failure without an error message, add something like your proposed message. 2. throw error directly in the callback rather than returning. I don't think this strategy actually works I attach a cut-down patch that doesn't deal with the page_read callbacks issue, just added stub comments in xlog.c where something should be done. -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ "I am amazed at [the pgsql-sql] mailing list for the wonderful support, and lack of hesitasion in answering a lost soul's question, I just wished the rest of the mailing list could be like this." (Fotis) (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index d894af310a..83976cb014 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12467,6 +12467,7 @@ retry: private->replayTLI, xlogreader->EndRecPtr)) { + /* XXX should this path set errormsg_buf? */ if (readFile >= 0) close(readFile); readFile = -1; @@ -12598,7 +12599,10 @@ next_record_is_invalid: if (StandbyMode) goto retry; else + { + /* XXX should set errormsg_buf here */ return -1; + } } /* diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c index 3a7de02565..5b61593820 100644 --- a/src/backend/access/transam/xlogreader.c +++ b/src/backend/access/transam/xlogreader.c @@ -650,14 +650,22 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) state->currRecPtr, state->readBuf); if (readLen < 0) + { + report_invalid_record(state, + "attempt to read page of next segment failed at %X/%X", + LSN_FORMAT_ARGS(targetSegmentPtr)); goto err; + } /* we can be sure to have enough WAL available, we scrolled back */ Assert(readLen == XLOG_BLCKSZ); if (!XLogReaderValidatePageHeader(state, targetSegmentPtr, state->readBuf)) + { + /* XLogReaderValidatePageHeader sets errormsg_buf */ goto err; + } } /* @@ -668,13 +676,18 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) state->currRecPtr, state->readBuf); if (readLen < 0) - goto err; + goto err; /* XXX errmsg? */ Assert(readLen <= XLOG_BLCKSZ); /* Do we have enough data to check the header length? */ if (readLen <= SizeOfXLogShortPHD) + { + report_invalid_record(state, + "unable to read short header of %d bytes at %X/%X", + readLen, LSN_FORMAT_ARGS(pageptr)); goto err; + } Assert(readLen >= reqLen); @@ -687,14 +700,17 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen) state->currRecPtr, state->readBuf); if (readLen < 0) - goto err; + goto err; /* XXX errmsg */ } /* * Now that we know we have the full header, validate it. */ if (!XLogReaderValidatePageHeader(state, pageptr, (char *) hdr)) + { + /* XLogReaderValidatePageHeader sets errormsg_buf */ goto err; + } /* update read state information */ state->seg.ws_segno = targetSegNo;