Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Andres Freund Mon, 19 Jan 2015 08:57:59 -0800

On 2015-01-19 17:16:11 +0900, Michael Paquier wrote:
> On Mon, Jan 19, 2015 at 4:10 PM, Michael Paquier
> <michael.paqu...@gmail.com> wrote:
> > On Sat, Jan 17, 2015 at 2:44 AM, Andres Freund <and...@2ndquadrant.com> 
> > wrote:
> >> Not this patch's fault, but I'm getting a bit tired seeing the above open
> >> coded. How about adding a function that does the sleeping based on a
> >> timestamptz and a ms interval?
> > You mean in plugins, right? I don't recall seeing similar patterns in
> > other code paths of backend. But I think that we can use something
> > like that in timestamp.c then because we need to leverage that between
> > two timestamps, the last failure and now():
> > TimestampSleepDifference(start_time, stop_time, internal_ms);
> > Perhaps you have something else in mind?
> >
> > Attached is an updated patch.


> Actually I came with better than last patch by using a boolean flag as
> return value of TimestampSleepDifference and use
> TimestampDifferenceExceeds directly inside it.

> Subject: [PATCH] Add wal_availability_check_interval to control WAL fetching
>  on failure

I think that name isn't a very good. And its isn't very accurate
either.

How about wal_retrieve_retry_interval? Not very nice, but I think it's
still better than the above.


> +     <varlistentry id="wal-availability-check-interval" 
> xreflabel="wal_availability_check_interval">
> +      <term><varname>wal_availability_check_interval</varname> 
> (<type>integer</type>)
> +      <indexterm>
> +        <primary><varname>wal_availability_check_interval</> recovery 
> parameter</primary>
> +      </indexterm>
> +      </term>
> +      <listitem>
> +       <para>
> +        This parameter specifies the amount of time to wait when
> +        WAL is not available for a node in recovery. Default value is
> +        <literal>5s</>.
> +       </para>
> +       <para>
> +        A node in recovery will wait for this amount of time if
> +        <varname>restore_command</> returns nonzero exit status code when
> +        fetching new WAL segment files from archive or when a WAL receiver
> +        is not able to fetch a WAL record when using streaming replication.
> +       </para>
> +      </listitem>
> +     </varlistentry>
> +
>      </variablelist>

Walreceiver doesn't wait that amount, but rather how long the connection
is intact. And restore_command may or may not retry.

>       /*-------
>        * Standby mode is implemented by a state machine:
> @@ -10490,15 +10511,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
> randAccess,
>                                        * machine, so we've exhausted all the 
> options for
>                                        * obtaining the requested WAL. We're 
> going to loop back
>                                        * and retry from the archive, but if 
> it hasn't been long
> -                                      * since last attempt, sleep 5 seconds 
> to avoid
> -                                      * busy-waiting.
> +                                      * since last attempt, sleep the amount 
> of time specified
> +                                      * by wal_availability_check_interval 
> to avoid busy-waiting.
>                                        */
> -                                     now = (pg_time_t) time(NULL);
> -                                     if ((now - last_fail_time) < 5)
> -                                     {
> -                                             pg_usleep(1000000L * (5 - (now 
> - last_fail_time)));
> -                                             now = (pg_time_t) time(NULL);
> -                                     }
> +                                     now = GetCurrentTimestamp();
> +                                     if 
> (TimestampSleepDifference(last_fail_time, now,
> +                                                                             
>         wal_availability_check_interval))
> +                                             now = GetCurrentTimestamp();

Not bad, that's much easier to read imo.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

Reply via email to