On Sat, Nov 24, 2018 at 11:29:22AM -0500, Jeff Janes wrote:
> I noticed that the existing codebase does not have a consensus on what to
> pass to WaitLatch for the timeout when the timeout isn't relevant. I picked
> 0, but -1L also has precedent.

WaitLatch enforces the timeout to -1 if WL_TIMEOUT is not given to the
caller, so setting a value does not matter much.  If WL_TIMEOUT is
defined, the code would blow up on an assertion if the timeout is
negative.  In short I would let the timeout be set to 5000L, but just
add the flag on the fly instead of having a if/else with two calls of
WaitLatch() as your patch does.

"git diff --check" complains about a whitespace issue.

Anyway, the timeout is useful to have for another thing: we check if
the WAL receiver is still alive thanks to that periodically, and this
even if no promote file is defined.  That's useful for failure
handling if the WAL receiver gets killed so as the startup process can
define more quickly a new source (or just create a new WAL sender) it
should use for feeding with fresh WAL data, no?

It seems to me that the comment on top of WaitLatch should be clearer
about that, and that the current state leads to confusion.  Another
thing coming to my mind is that I think it would be useful to make the
timeout configurable so as instances can react more quickly in the
case of a sudden death of the WAL receiver (or to check faster for a
trigger file if the HA application is to lazy to send a signal to the
standby host).

Attached is a patch to improve the comment for now.

Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index c80b14ed97..4452d7566a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12095,7 +12095,8 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 
 					/*
 					 * Wait for more WAL to arrive. Time out after 5 seconds
-					 * to react to a trigger file promptly.
+					 * to react to a trigger file promptly and to check if
+					 * the WAL receiver is still active.
 					 */
 					(void) WaitLatch(&XLogCtl->recoveryWakeupLatch,
 									 WL_LATCH_SET | WL_TIMEOUT |

Attachment: signature.asc
Description: PGP signature

Reply via email to