Re: [HACKERS] Patch for fail-back without fresh backup

Sameer Thakur Fri, 20 Sep 2013 03:13:39 -0700

>>
> >Attached patch combines documentation patch and source-code patch.
>


I have had a stab at reviewing the documentation. Have a look.

--- a/doc/src/sgml/config.sgml

+++ b/doc/src/sgml/config.sgml

@@ -1749,6 +1749,50 @@ include 'filename'

       </listitem>

      </varlistentry>

+     <varlistentry id="guc-synchronous-transfer"
xreflabel="synchronous_transfer">

+      <term><varname>synchronous_transfer</varname>
(<type>enum</type>)</term>

+      <indexterm>

+       <primary><varname>synchronous_transfer</> configuration
parameter</primary>

+      </indexterm>

+      <listitem>

+       <para>

+        This parameter controls the synchronous nature of WAL transfer and

+        maintains file system level consistency between master server and

+        standby server. It specifies whether master server will wait for
file

+        system level change (for example : modifying data page) before

+        the corresponding WAL records are replicated to the standby server.

+       </para>

+       <para>

+        Valid values are <literal>commit</>, <literal>data_flush</> and

+        <literal>all</>. The default value is <literal>commit</>, meaning

+        that master will only wait for transaction commits, this is
equivalent

+        to turning off <literal>synchronous_transfer</> parameter and
standby

+        server will behave as a <quote>synchronous standby </> in

+        Streaming Replication. For value <literal>data_flush</>, master
will

+        wait only for data page modifications but not for transaction

+        commits, hence the standby server will act as <quote>asynchronous

+        failback safe standby</>. For value <literal> all</>, master will
wait

+        for data page modifications as well as for transaction commits and

+        resultant standby server will act as <quote>synchronous failback
safe

+        standby</>.The wait is on background activities and hence will not
create performance overhead.

+      To configure synchronous failback safe standby

+        <xref linkend="guc-synchronous-standby-names"> should be set.

+       </para>

+      </listitem>

+     </varlistentry>



@@ -2258,14 +2302,25 @@ include 'filename'</indexterm>

       <listitem>

        <para>

-        Specifies a comma-separated list of standby names that can support

-        <firstterm>synchronous replication</>, as described in

-        <xref linkend="synchronous-replication">.

-        At any one time there will be at most one active synchronous
standby;

-        transactions waiting for commit will be allowed to proceed after

-        this standby server confirms receipt of their data.

-        The synchronous standby will be the first standby named in this
list

-        that is both currently connected and streaming data in real-time

+        Specifies a comma-separated list of standby names. If this
parameter

+        is set then standby will behave as synchronous standby in
replication,

+        as described in <xref linkend="synchronous-replication"> or
synchronous

+        failback safe standby, as described in <xref
linkend="failback-safe">.

+        At any time there will be at most one active standby; when standby
is

+        synchronous standby in replication, transactions waiting for commit

+        will be allowed to proceed after this standby server confirms
receipt

+        of their data. But when standby is synchronous failback safe
standby

+        data page modifications as well as transaction commits will be
allowed

+        to proceed only after this standby server confirms receipt of
their data.

+        If this parameter is set to empty value and

+        <xref linkend="guc-synchronous-transfer"> is set to
<literal>data_flush</>

+        then standby is called as asynchronous failback safe standby and
only

+        data page modifications will wait before corresponding WAL record
is

+        replicated to standby.

+       </para>

+       <para>

+        Synchronous standby in replication will be the first standby named
in

+        this list that is both currently connected and streaming data in
real-time

         (as shown by a state of <literal>streaming</literal> in the

         <link linkend="monitoring-stats-views-table">

         <literal>pg_stat_replication</></link> view).





--- a/doc/src/sgml/high-availability.sgml

+++ b/doc/src/sgml/high-availability.sgml

+

+  <sect2 id="failback-safe">

+     <title>Setting up failback safe standby</title>

+

+   <indexterm zone="high-availability">

+       <primary>Setting up failback safe standby</primary>

+   </indexterm>

+

+   <para>

+ PostgreSQL streaming replication offers durability, but if the master
crashes and

+a particular WAL record is unable to reach to standby server, then that

+WAL record is present on master server but not on standby server.

+In such a case master is ahead of standby server in term of WAL records
and data in database.

+This leads to file-system level inconsistency between master and standby
server.

+For example a heap page update on the master might not have been reflected
on standby when master crashes.

+   </para>

+

+   <para>

+Due to this inconsistency, fresh backup of new master onto new standby is
needed to re-prepare HA cluster.

+Taking fresh backup can be a very time consuming process when database is
of large size. In such a case, disaster recovery

+can take very long time, if streaming replication is used to setup the
high availability cluster.

+   </para>

+

+   <para>

+If HA cluster is configured with failback safe standby then this fresh
back up can be avoided.

+The <xref linkend="guc-synchronous-transfer"> parameter has control over
all WAL transfers and

+will not make any file system level change until master gets a
confirmation from standby server.

+This avoids the need of a fresh backup by maintaining consistency.

+   </para>

+

+   <sect3 id="Failback-safe-config">

+    <title>Basic Configuration</title>

+   <para>

+    Failback safe standby can be asynchronous or synchronous in nature.

+    This will depend upon whether master will wait for transaction commit

+    or not. By default failback safe mechanism is turned off.

+   </para>

+

+   <para>

+    The first step to configure HA with failback safe standby is to setup

+    streaming replication. Configuring synchronous failback safe standby

+    requires setting up  <xref linkend="guc-synchronous-transfer"> to

+    <literal>all</> and <xref linkend="guc-synchronous-standby-names">

+    must be set to a non-empty value. This configuration will cause each

+    commit and data page modification to wait for confirmation that standby

+    has written corresponding WAL record to durable storage. Configuring

+    asynchronous failback safe standby requires only setting up

+     <xref linkend="guc-synchronous-transfer"> to <literal> data_flush</>.

+    This configuration will cause only data page modifications to wait

+    for confirmation that standby has written corresponding WAL record

+    to durable storage.

+   </para>

+

+  </sect3>

+  </sect2>

   </sect1>



   <sect1 id="warm-standby-failover">

    </para>



    <para>

-    So, switching from primary to standby server can be fast but requires

-    some time to re-prepare the failover cluster. Regular switching from

-    primary to standby is useful, since it allows regular downtime on

-    each system for maintenance. This also serves as a test of the

-    failover mechanism to ensure that it will really work when you need it.

-    Written administration procedures are advised.

+    At the time of failover there is a possibility of file-system level

+    inconsistency between the old primary and the old standby server and
hence

+    a fresh backup from new master onto old master is needed for
configuring

+    the old primary server as a new standby server. Without taking fresh

+    backup even if the new standby starts, streaming replication does not

+    start successfully. The activity of taking backup can be fast for
smaller

+    databases but for a large database this activity requires more time to
re-prepare the

+    failover cluster in streaming replication configuration of HA cluster.

 + This could break the service level agreement for crash

+    recovery. The need of fresh backup and problem of long

+    recovery time can be solved by using if HA cluster is configured with

+    failback safe standby see <xref linkend="failback-safe">.

+    Failback safe standby allows  synchronous WAL transfer at required
places

+    while maintaining the file-system level consistency between master and
standby

+    server, without having backup to be taken on the old master.

+   </para>

+

+   <para>

+    Regular switching from primary to standby is useful, since it allows

+    regular downtime on each system for maintenance. This also serves as

+    a test of the failover mechanism to ensure that it will really work

+    when you need it. Written administration procedures are advised.

    </para>



    <para>

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml

index 2af1738..da3820f 100644

--- a/doc/src/sgml/perform.sgml

+++ b/doc/src/sgml/perform.sgml

       </para>

      </listitem>

+

+     <listitem>

+      <para>

+       Set <xref linkend="guc-synchronous-transfer"> to commit; there is no

+       need to guard against database inconsistency between master and
standby during failover.

+      </para>
+     </listitem>

Re: [HACKERS] Patch for fail-back without fresh backup

Reply via email to