At Mon, 14 Dec 2020 16:48:05 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > On Mon, Dec 14, 2020 at 11:34:51AM +0900, Kyotaro Horiguchi wrote: > > Apart from this issue, while checking that, I noticed that if server > > starts having WALs from a server of a different systemid, the server > > stops with obscure messages. > > Wouldn't it be better to discuss that on a separate thread? I have > mostly missed your message here.
Right. Here is the duplicate of the message. Thanks for the suggestion! ===== While in another discussion related to xlogreader[2], I noticed that if server starts having WALs from a server of a different systemid, the server stops with obscure messages. > LOG: database system was shut down at 2020-12-14 10:36:02 JST > LOG: invalid primary checkpoint record > PANIC: could not locate a valid checkpoint record The cause is XLogPageRead erases the error message set by XLogReaderValidatePageHeader(). As the comment just above says, this is required to continue replication under a certain situation. The code is aiming to allow continue replication when the first half of a continued record has been removed on the primary so we don't need to do the amendment unless we're in standby mode. If we let the savior code only while StandbyMode, we would have the correct error message. > JST LOG: database system was shut down at 2020-12-14 10:36:02 JST > LOG: WAL file is from different database system: WAL file database system > identifier is 6905923817995618754, pg_control database system identifier is > 6905924227171453468 > JST LOG: invalid primary checkpoint record > JST PANIC: could not locate a valid checkpoint record I confirmed 0668719801 still works under the intended context using the steps shown in [1]. [1]: https://www.postgresql.org/message-id/flat/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com [2]: https://www.postgresql.org/message-id/flat/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com regards. -- Kyotaro Horiguchi NTT Open Source Software Center
>From d54531aa2774bad7e426cc16691553fbc8f0b3b3 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyoga....@gmail.com> Date: Mon, 14 Dec 2020 11:18:08 +0900 Subject: [PATCH] Don't cancel invalid-page-header error in unwanted situation The commit 0668719801 is intending to work while streaming replication but it cancels the error message regardless of the context. As the result ReadRecord fails to show the correct error messages even when it is required, that is, not while replication. Allowing the cancellation happen only on non-standby fixes that. --- src/backend/access/transam/xlog.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 7e81ce4f17..770902518d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12055,7 +12055,8 @@ retry: * Validating the page header is cheap enough that doing it twice * shouldn't be a big deal from a performance point of view. */ - if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) + if (StandbyMode && + !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) { /* reset any error XLogReaderValidatePageHeader() might have set */ xlogreader->errormsg_buf[0] = '\0'; -- 2.27.0