On Fri, Feb 6, 2015 at 4:58 PM, Fujii Masao wrote: > timestamp.c:1708: warning: implicit declaration of function > 'HandleStartupProcInterrupts' > > I got the above warning at the compilation. > > + pg_usleep(wait_time); > + HandleStartupProcInterrupts(); > + total_time -= wait_time; > > It seems strange to call the startup-process-specific function in > the "generic" function like TimestampSleepDifference().
I changed the sleep to a WaitLatch and moved all the logic back to xlog.c, removing at the same the stuff in timestamp.c because it seems strange to add a dependency with even latch there. > * 5. Sleep 5 seconds, and loop back to 1. > > In WaitForWALToBecomeAvailable(), the above comment should > be updated. Done. > - * Wait for more WAL to arrive. Time out after 5 seconds, > - * like when polling the archive, to react to a trigger > - * file promptly. > + * Wait for more WAL to arrive. Time out after > the amount of > + * time specified by wal_retrieve_retry_interval, like > + * when polling the archive, to react to a > trigger file promptly. > */ > WaitLatch(&XLogCtl->recoveryWakeupLatch, > WL_LATCH_SET | WL_TIMEOUT, > - 5000L); > + wal_retrieve_retry_interval * 1000L); > > This change can prevent the startup process from reacting to > a trigger file. Imagine the case where the large interval is set > and the user want to promote the standby by using the trigger file > instead of pg_ctl promote. I think that the sleep time should be 5s > if the interval is set to more than 5s. Thought? I disagree here. It is interesting to accelerate the check of WAL availability from a source in some cases for replication, but the opposite is true as well as mentioned by Alexey at the beginning of the thread to reduce the number of requests when requesting WAL archives from an external storage type AWS. Hence a correct solution would be to check periodically for the trigger file with a maximum one-time wait of 5s to ensure backward-compatible behavior. We could reduce it to 1s or something like that as well. > +# specifies an optional internal to wait for WAL to become available when > > s/internal/interval? This part has been removed in the new part as parameter is set as a GUC. > > + This parameter specifies the amount of time to wait when a failure > + occurred when reading WAL from a source (be it via streaming > + replication, local <filename>pg_xlog</> or WAL archive) for a node > + in standby mode, or when WAL is expected from a source. > > At least to me, it seems not easy to understand what the parameter is > from this description. What about the following, for example? > This parameter specifies the amount of time to wait when WAL is not > available from any sources (streaming replication, local > <filename>pg_xlog</> or WAL archive) before retrying to retrieve WAL.... OK, done. > Isn't it better to place this parameter in postgresql.conf rather than > recovery.conf? I'd like to change the value of this parameter without > restarting the server. Yes, agreed. Done so with a SIGHUP guc. Regards, -- Michael
From 594db45eda43c0abc13ad0dbca81d6ed3f4650e7 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@otacoo.com> Date: Mon, 19 Jan 2015 16:08:48 +0900 Subject: [PATCH] Add wal_retrieve_retry_interval This parameter aids to control at which timing WAL availability is checked when a node is in recovery, particularly when successive failures happen when fetching WAL archives, or when fetching WAL records from a streaming source. Default value is 5s. --- doc/src/sgml/config.sgml | 17 +++++ src/backend/access/transam/xlog.c | 91 ++++++++++++++++++--------- src/backend/utils/misc/guc.c | 12 ++++ src/backend/utils/misc/postgresql.conf.sample | 3 + src/include/access/xlog.h | 1 + 5 files changed, 93 insertions(+), 31 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6bcb106..d82b26a 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2985,6 +2985,23 @@ include_dir 'conf.d' </listitem> </varlistentry> + <varlistentry id="guc-wal-retrieve-retry-interval" xreflabel="wal_retrieve_retry_interval"> + <term><varname>wal_retrieve_retry_interval</varname> (<type>integer</type>) + <indexterm> + <primary><varname>wal_retrieve_retry_interval</> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Specify the amount of time to wait when WAL is not available from + any sources (streaming replication, local <filename>pg_xlog</> or + WAL archive) before retrying to retrieve WAL. This parameter can + only be set in the <filename>postgresql.conf</> file or on the + server command line. The default value is 5 seconds. + </para> + </listitem> + </varlistentry> + </variablelist> </sect2> </sect1> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..769fbf3 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -93,6 +93,7 @@ int sync_method = DEFAULT_SYNC_METHOD; int wal_level = WAL_LEVEL_MINIMAL; int CommitDelay = 0; /* precommit delay in microseconds */ int CommitSiblings = 5; /* # concurrent xacts needed to sleep */ +int wal_retrieve_retry_interval = 5000; #ifdef WAL_DEBUG bool XLOG_DEBUG = false; @@ -10340,8 +10341,8 @@ static bool WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, bool fetching_ckpt, XLogRecPtr tliRecPtr) { - static pg_time_t last_fail_time = 0; - pg_time_t now; + TimestampTz now = GetCurrentTimestamp(); + TimestampTz last_fail_time = now; /*------- * Standby mode is implemented by a state machine: @@ -10351,7 +10352,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * 2. Check trigger file * 3. Read from primary server via walreceiver (XLOG_FROM_STREAM) * 4. Rescan timelines - * 5. Sleep 5 seconds, and loop back to 1. + * 5. Sleep the amount of time defined by wal_retrieve_retry_interval + * while checking periodically for the presence of user-defined + * trigger file and loop back to 1. * * Failure to read from the current source advances the state machine to * the next state. @@ -10490,14 +10493,27 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * machine, so we've exhausted all the options for * obtaining the requested WAL. We're going to loop back * and retry from the archive, but if it hasn't been long - * since last attempt, sleep 5 seconds to avoid - * busy-waiting. + * since last attempt, sleep the amount of time specified + * by wal_retrieve_retry_interval to avoid busy-waiting. */ - now = (pg_time_t) time(NULL); - if ((now - last_fail_time) < 5) + now = GetCurrentTimestamp(); + if (!TimestampDifferenceExceeds(last_fail_time, now, + wal_retrieve_retry_interval)) { - pg_usleep(1000000L * (5 - (now - last_fail_time))); - now = (pg_time_t) time(NULL); + long secs, wait_time; + int microsecs; + TimestampDifference(last_fail_time, now, &secs, µsecs); + + wait_time = wal_retrieve_retry_interval * 1000L - + (1000000L * secs + 1L * microsecs); + if (wait_time > 5000000L) /* 5s */ + wait_time = 5000000L; + + WaitLatch(&XLogCtl->recoveryWakeupLatch, + WL_LATCH_SET | WL_TIMEOUT, + wait_time / 1000); + ResetLatch(&XLogCtl->recoveryWakeupLatch); + now = GetCurrentTimestamp(); } last_fail_time = now; currentSource = XLOG_FROM_ARCHIVE; @@ -10562,6 +10578,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, case XLOG_FROM_STREAM: { bool havedata; + long remaining_wait_time; /* * Check if WAL receiver is still active. @@ -10634,33 +10651,45 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } /* - * Data not here yet. Check for trigger, then wait for - * walreceiver to wake us up when new WAL arrives. + * Wait for more WAL for a total of time defined by + * wal_retrieve_retry_interval and check the presence of a + * user-defined trigger file every 5s to ensure quick + * responsiveness of system. */ - if (CheckForStandbyTrigger()) + remaining_wait_time = wal_retrieve_retry_interval * 1L; + while (remaining_wait_time > 0) { + long wait_time = 5000L; /* 5s */ + + if (remaining_wait_time < wait_time) + wait_time = remaining_wait_time; + /* - * Note that we don't "return false" immediately here. - * After being triggered, we still want to replay all - * the WAL that was already streamed. It's in pg_xlog - * now, so we just treat this as a failure, and the - * state machine will move on to replay the streamed - * WAL from pg_xlog, and then recheck the trigger and - * exit replay. + * Data not here yet. Check for trigger, then wait for + * walreceiver to wake us up when new WAL arrives. */ - lastSourceFailed = true; - break; - } + if (CheckForStandbyTrigger()) + { + /* + * Note that we don't "return false" immediately here. + * After being triggered, we still want to replay all + * the WAL that was already streamed. It's in pg_xlog + * now, so we just treat this as a failure, and the + * state machine will move on to replay the streamed + * WAL from pg_xlog, and then recheck the trigger and + * exit replay. + */ + lastSourceFailed = true; + break; + } - /* - * Wait for more WAL to arrive. Time out after 5 seconds, - * like when polling the archive, to react to a trigger - * file promptly. - */ - WaitLatch(&XLogCtl->recoveryWakeupLatch, - WL_LATCH_SET | WL_TIMEOUT, - 5000L); - ResetLatch(&XLogCtl->recoveryWakeupLatch); + /* wait a bit */ + WaitLatch(&XLogCtl->recoveryWakeupLatch, + WL_LATCH_SET | WL_TIMEOUT, + wait_time); + ResetLatch(&XLogCtl->recoveryWakeupLatch); + remaining_wait_time -= wait_time; + } break; } diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 9572777..c4598ed 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2364,6 +2364,18 @@ static struct config_int ConfigureNamesInt[] = }, { + {"wal_retrieve_retry_interval", PGC_SIGHUP, WAL_SETTINGS, + gettext_noop("Specifies the amount of time to wait when WAL is not " + "available from a source."), + NULL, + GUC_UNIT_MS + }, + &wal_retrieve_retry_interval, + 5000, 1, INT_MAX, + NULL, NULL, NULL + }, + + { {"wal_segment_size", PGC_INTERNAL, PRESET_OPTIONS, gettext_noop("Shows the number of pages per write ahead log segment."), NULL, diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index b053659..73e2bca 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -260,6 +260,9 @@ #wal_receiver_timeout = 60s # time that receiver waits for # communication from master # in milliseconds; 0 disables +#wal_retrieve_retry_interval = 5s # time to wait before retrying to + # retrieve WAL from a source (streaming + # replication, archive or local pg_xlog) #------------------------------------------------------------------------------ diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 138deaf..be27a85 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -93,6 +93,7 @@ extern int CheckPointSegments; extern int wal_keep_segments; extern int XLOGbuffers; extern int XLogArchiveTimeout; +extern int wal_retrieve_retry_interval; extern bool XLogArchiveMode; extern char *XLogArchiveCommand; extern bool EnableHotStandby; -- 2.3.0
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers