Hello, as in pgsql-bug ML.

https://www.postgresql.org/message-id/20180517.170021.24356216.horiguchi.kyot...@lab.ntt.co.jp

Master can go into infinite loop on shutdown. But it is caused by
a broken database like storage rolled-back one. (The steps to
replay this is shown in the above mail.)

I think this can be avoided by rejecting a standby if it reports
that write LSN is smaller than flush LSN after catching up.

Is it worth fixing?

# The patch is slightly different from that I posted to -bugs.

It is enough to chek for the invalid state just once but the
patch continues the check.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index e47ddca6bc..1916acf629 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1809,6 +1809,19 @@ ProcessStandbyReplyMessage(void)
 	if (replyRequested)
 		WalSndKeepalive(false);
 
+	/*
+	 * Once this stream catches up to WAL, writePtr cannot be smaller than
+	 * flushPtr. Otherwise we haven't reached the standby's starting LSN thus
+	 * this database cannot be a proper master of the standby. The state
+	 * causes infinite loop on shutdown.
+	 */
+	if (MyWalSnd->state >= WALSNDSTATE_CATCHUP &&
+		writePtr != InvalidXLogRecPtr && writePtr < flushPtr)
+		ereport(ERROR,
+				(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+				 errmsg("Standby started from the future LSN for this server"),
+				 errhint("This means that the standby is not created from this database.")));
+
 	/*
 	 * Update shared state for this WalSender process based on reply data from
 	 * standby.

Reply via email to