Hello, as in pgsql-bug ML. https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyot...@lab.ntt.co.jp
Master can go into infinite loop on shutdown. But it is caused by a broken database like storage rolled-back one. (The steps to replay this is shown in the above mail.) I think this can be avoided by rejecting a standby if it reports that write LSN is smaller than flush LSN after catching up. Is it worth fixing? # The patch is slightly different from that I posted to -bugs. It is enough to chek for the invalid state just once but the patch continues the check. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index e47ddca6bc..1916acf629 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1809,6 +1809,19 @@ ProcessStandbyReplyMessage(void) if (replyRequested) WalSndKeepalive(false); + /* + * Once this stream catches up to WAL, writePtr cannot be smaller than + * flushPtr. Otherwise we haven't reached the standby's starting LSN thus + * this database cannot be a proper master of the standby. The state + * causes infinite loop on shutdown. + */ + if (MyWalSnd->state >= WALSNDSTATE_CATCHUP && + writePtr != InvalidXLogRecPtr && writePtr < flushPtr) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("Standby started from the future LSN for this server"), + errhint("This means that the standby is not created from this database."))); + /* * Update shared state for this WalSender process based on reply data from * standby.