Ashutosh Bapat noticed that WalSndWaitForWal() is setting waiting_for_ping_response after sending a keepalive that does *not* request a reply. The bad consequence is that other callers that do require a reply end up in not sending a keepalive, because they think it was already sent previously. So the whole thing gets stuck.
He found that commit 41d5f8ad734 failed to remove the setting of waiting_for_ping_response after changing the "request" parameter WalSndKeepalive from true to false; that seems to have been an omission and it breaks the algorithm. Thread at [1]. The simplest fix is just to remove the line that sets waiting_for_ping_response, but I think it is less error-prone to have WalSndKeepalive set the flag itself, instead of expecting its callers to do it (and know when to). Patch attached. Also rewords some related commentary. [1] https://postgr.es/m/flat/blu436-smtp25712b7ef9fc2adeb87c522dc...@phx.gbl -- Álvaro Herrera Valdivia, Chile
>From bf5ca106817744c1ba300ecb8c86d230789988e0 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera <alvhe...@alvh.no-ip.org> Date: Thu, 6 Aug 2020 14:57:09 -0400 Subject: [PATCH] Fix waiting_for_ping in walsender --- src/backend/replication/walsender.c | 29 ++++++++++++++--------------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 5e2210dd7b..c239902e0e 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -151,7 +151,7 @@ static XLogRecPtr sendTimeLineValidUpto = InvalidXLogRecPtr; * How far have we sent WAL already? This is also advertised in * MyWalSnd->sentPtr. (Actually, this is the next WAL location to send.) */ -static XLogRecPtr sentPtr = 0; +static XLogRecPtr sentPtr = InvalidXLogRecPtr; /* Buffers for constructing outgoing messages and processing reply messages. */ static StringInfoData output_message; @@ -1444,17 +1444,13 @@ WalSndWaitForWal(XLogRecPtr loc) * We only send regular messages to the client for full decoded * transactions, but a synchronous replication and walsender shutdown * possibly are waiting for a later location. So, before sleeping, we - * send a ping containing the flush location. If the receiver is - * otherwise idle, this keepalive will trigger a reply. Processing the - * reply will update these MyWalSnd locations. + * send a ping containing the flush location. A reply from standby is + * not needed and would be wasteful. */ if (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && !waiting_for_ping_response) - { WalSndKeepalive(false); - waiting_for_ping_response = true; - } /* check whether we're done */ if (loc <= RecentFlushPtr) @@ -2932,10 +2928,7 @@ WalSndDone(WalSndSendDataCallback send_data) proc_exit(0); } if (!waiting_for_ping_response) - { WalSndKeepalive(true); - waiting_for_ping_response = true; - } } /* @@ -3432,10 +3425,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS) } /* - * This function is used to send a keepalive message to standby. - * If requestReply is set, sets a flag in the message requesting the standby - * to send a message back to us, for heartbeat purposes. - */ + * Send a keepalive message to standby. + * + * If requestReply is set, the message requests the other party to send + * a message back to us, for heartbeat purposes. We also set a flag to + * let nearby code that we're waiting for that response, to avoid + * repeated requests. + */ static void WalSndKeepalive(bool requestReply) { @@ -3450,6 +3446,10 @@ WalSndKeepalive(bool requestReply) /* ... and send it wrapped in CopyData */ pq_putmessage_noblock('d', output_message.data, output_message.len); + + /* Set local flag */ + if (requestReply) + waiting_for_ping_response = true; } /* @@ -3480,7 +3480,6 @@ WalSndKeepaliveIfNecessary(void) if (last_processing >= ping_time) { WalSndKeepalive(true); - waiting_for_ping_response = true; /* Try to flush pending output to the client */ if (pq_flush_if_writable() != 0) -- 2.20.1