On 02/25/2014 06:41 PM, Robert Haas wrote:
On Tue, Feb 25, 2014 at 11:23 AM, Andres Freund <and...@2ndquadrant.com> wrote:
Usually that state will be reached very quickly because before
that we're writing data to the network as fast as it can be read from
disk.
I'm unimpressed. Even if that is in practice true, making the code
self-consistent is a goal of non-trivial value. The timing of sending
keep-alives has no business depending on the state of the write queue,
and right now it doesn't. Your patch would make it depend on that,
mostly by accident AFAICS.
Yeah, the timeout should should be checked regardless of the status of
the write queue. Per the attached patch.
While looking at this, I noticed that the time we sleep between each
iteration is calculated in a funny way. It's not this patch's fault, but
moving the code made it more glaring. After the patch, it looks like this:
TimestampTz timeout;
long sleeptime = 10000; /* 10 s */
...
/*
* If wal_sender_timeout is active, sleep in smaller increments
* to not go over the timeout too much. XXX: Why not just sleep
* until the timeout has elapsed?
*/
if (wal_sender_timeout > 0)
sleeptime = 1 + (wal_sender_timeout / 10);
/* Sleep until something happens or we time out */
...
WaitLatchOrSocket(&MyWalSnd->latch, wakeEvents,
MyProcPort->sock, sleeptime);
...
/*
* Check for replication timeout. Note we ignore the corner case
* possibility that the client replied just as we reached the
* timeout ... he's supposed to reply *before* that.
*/
timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
wal_sender_timeout);
if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
{
...
}
The logic was the same before the patch, but I added the XXX comment
above. Why do we sleep in increments of 1/10 of wal_sender_timeout?
Originally, the code calculated when the next wakeup should happen, by
adding wal_sender_timeout (or replication_timeout, as it was called back
then) to the time of the last reply. Why don't we do that?
The code seems to have just slowly devolved into that. It was changed
from sleep-until-timeout to sleep 1/10th of wal_sender_timeout by a
patch that made walsender send a keep-alive message to the client every
time it wakes up, and I guess the idea was to send a message at an
interval that's 1/10th of the timeout. But that idea is long gone by
now; first, commit da4efa13d801ccc179f1d2c24d8a60c4a2f8ede9 turned off
sending the keep-alive messages by default, and then
6f60fdd7015b032bf49273c99f80913d57eac284 turned them back on, but so
that it's only sent once when 1/2 of wal_sender_timeout has elapsed.
I think that should be changed back to sleep until the next deadline of
when something needs to happen, instead of polling. But that can be a
separate patch.
- Heikki
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 4fcf3d4..5c7456d 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1259,6 +1259,27 @@ WalSndLoop(void)
}
/*
+ * If half of wal_sender_timeout has lapsed without receiving any
+ * reply from standby, send a keep-alive message requesting an
+ * immediate reply.
+ */
+ if (wal_sender_timeout > 0 && !ping_sent)
+ {
+ TimestampTz timeout;
+
+ timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout / 2);
+ if (GetCurrentTimestamp() >= timeout)
+ {
+ WalSndKeepalive(true);
+ ping_sent = true;
+ /* Try to flush pending output to the client */
+ if (pq_flush_if_writable() != 0)
+ goto send_failure;
+ }
+ }
+
+ /*
* We don't block if not caught up, unless there is unsent data
* pending in which case we'd better block until the socket is
* write-ready. This test is only needed for the case where XLogSend
@@ -1267,7 +1288,7 @@ WalSndLoop(void)
*/
if ((caughtup && !streamingDoneSending) || pq_is_send_pending())
{
- TimestampTz timeout = 0;
+ TimestampTz timeout;
long sleeptime = 10000; /* 10 s */
int wakeEvents;
@@ -1276,32 +1297,14 @@ WalSndLoop(void)
if (pq_is_send_pending())
wakeEvents |= WL_SOCKET_WRITEABLE;
- else if (wal_sender_timeout > 0 && !ping_sent)
- {
- /*
- * If half of wal_sender_timeout has lapsed without receiving
- * any reply from standby, send a keep-alive message to
- * standby requesting an immediate reply.
- */
- timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
- wal_sender_timeout / 2);
- if (GetCurrentTimestamp() >= timeout)
- {
- WalSndKeepalive(true);
- ping_sent = true;
- /* Try to flush pending output to the client */
- if (pq_flush_if_writable() != 0)
- goto send_failure;
- }
- }
- /* Determine time until replication timeout */
+ /*
+ * If wal_sender_timeout is active, sleep in smaller increments
+ * to not go over the timeout too much. XXX: Why not just sleep
+ * until the timeout has elapsed?
+ */
if (wal_sender_timeout > 0)
- {
- timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
- wal_sender_timeout);
sleeptime = 1 + (wal_sender_timeout / 10);
- }
/* Sleep until something happens or we time out */
ImmediateInterruptOK = true;
@@ -1315,6 +1318,8 @@ WalSndLoop(void)
* possibility that the client replied just as we reached the
* timeout ... he's supposed to reply *before* that.
*/
+ timeout = TimestampTzPlusMilliseconds(last_reply_timestamp,
+ wal_sender_timeout);
if (wal_sender_timeout > 0 && GetCurrentTimestamp() >= timeout)
{
/*
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers