Re: [HACKERS] walsender doesn't send keepalives when writes are pending

Heikki Linnakangas Wed, 05 Mar 2014 08:27:50 -0800

On 02/25/2014 06:41 PM, Robert Haas wrote:

On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <[email protected]> wrote:

Usually that state will be reached very quickly because before
that we're writing data to the network as fast as it can be read from
disk.


I'm unimpressed.  Even if that is in practice true, making the code
self-consistent is a goal of non-trivial value.  The timing of sending
keep-alives has no business depending on the state of the write queue,
and right now it doesn't.  Your patch would make it depend on that,
mostly by accident AFAICS.

Yeah, the timeout should should be checked regardless of the status ofthe write queue. Per the attached patch.

While looking at this, I noticed that the time we sleep between eachiteration is calculated in a funny way. It's not this patch's fault, butmoving the code made it more glaring. After the patch, it looks like this:

        TimestampTz timeout;
        long            sleeptime = 10000;              /* 10 s */
...
        /*
         * If wal_sender_timeout is active, sleep in smaller increments
         * to not go over the timeout too much. XXX: Why not just sleep
         * until the timeout has elapsed?
         */
        if (wal_sender_timeout > 0)
                sleeptime = 1 + (wal_sender_timeout / 10);

        /* Sleep until something happens or we time out */
...
        WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
                          MyProcPort->sock, sleeptime);
...
        /*
         * Check for replication timeout.  Note we ignore the corner case
         * possibility that the client replied just as we reached the
         * timeout ... he's supposed to reply *before* that.
         */
        timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
                                          wal_sender_timeout);
        if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
        {
                ...
        }

The logic was the same before the patch, but I added the XXX commentabove. Why do we sleep in increments of 1/10 of wal_sender_timeout?Originally, the code calculated when the next wakeup should happen, byadding wal_sender_timeout (or replication_timeout, as it was called backthen) to the time of the last reply. Why don't we do that?

The code seems to have just slowly devolved into that. It was changedfrom sleep-until-timeout to sleep 1/10th of wal_sender_timeout by apatch that made walsender send a keep-alive message to the client everytime it wakes up, and I guess the idea was to send a message at aninterval that's 1/10th of the timeout. But that idea is long gone bynow; first, commit da4efa13d801ccc179f1d2c24d8a60c4a2f8ede9 turned offsending the keep-alive messages by default, and then6f60fdd7015b032bf49273c99f80913d57eac284 turned them back on, but sothat it's only sent once when 1/2 of wal_sender_timeout has elapsed.

I think that should be changed back to sleep until the next deadline ofwhen something needs to happen, instead of polling. But that can be aseparate patch.


- Heikki

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4fcf3d4..5c7456d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1259,6 +1259,27 @@ WalSndLoop(void)
 		}
 
 		/*
+		 * If half of wal_sender_timeout has lapsed without receiving any
+		 * reply from standby, send a keep-alive message requesting an
+		 * immediate reply.
+		 */
+		if (wal_sender_timeout > 0 && !ping_sent)
+		{
+			TimestampTz timeout;
+
+			timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+												  wal_sender_timeout / 2);
+			if (GetCurrentTimestamp() >= timeout)
+			{
+				WalSndKeepalive(true);
+				ping_sent = true;
+				/* Try to flush pending output to the client */
+				if (pq_flush_if_writable() != 0)
+					goto send_failure;
+			}
+		}
+
+		/*
 		 * We don't block if not caught up, unless there is unsent data
 		 * pending in which case we'd better block until the socket is
 		 * write-ready.  This test is only needed for the case where XLogSend
@@ -1267,7 +1288,7 @@ WalSndLoop(void)
 		 */
 		if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
 		{
-			TimestampTz timeout = 0;
+			TimestampTz timeout;
 			long		sleeptime = 10000;		/* 10 s */
 			int			wakeEvents;
 
@@ -1276,32 +1297,14 @@ WalSndLoop(void)
 
 			if (pq_is_send_pending())
 				wakeEvents |= WL_SOCKET_WRITEABLE;
-			else if (wal_sender_timeout > 0 && !ping_sent)
-			{
-				/*
-				 * If half of wal_sender_timeout has lapsed without receiving
-				 * any reply from standby, send a keep-alive message to
-				 * standby requesting an immediate reply.
-				 */
-				timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
-													  wal_sender_timeout / 2);
-				if (GetCurrentTimestamp() >= timeout)
-				{
-					WalSndKeepalive(true);
-					ping_sent = true;
-					/* Try to flush pending output to the client */
-					if (pq_flush_if_writable() != 0)
-						goto send_failure;
-				}
-			}
 
-			/* Determine time until replication timeout */
+			/*
+			 * If wal_sender_timeout is active, sleep in smaller increments
+			 * to not go over the timeout too much. XXX: Why not just sleep
+			 * until the timeout has elapsed?
+			 */
 			if (wal_sender_timeout > 0)
-			{
-				timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
-													  wal_sender_timeout);
 				sleeptime = 1 + (wal_sender_timeout / 10);
-			}
 
 			/* Sleep until something happens or we time out */
 			ImmediateInterruptOK = true;
@@ -1315,6 +1318,8 @@ WalSndLoop(void)
 			 * possibility that the client replied just as we reached the
 			 * timeout ... he's supposed to reply *before* that.
 			 */
+			timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+												  wal_sender_timeout);
 			if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
 			{
 				/*

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender doesn't send keepalives when writes are pending

Reply via email to