On 2020-11-19 20:17, Sergei Kornilov wrote:
Seems WAIT_EVENT_RECOVERY_PAUSE addition was lost during patch simplification.

added

                ereport(FATAL,
                                (errmsg("recovery aborted because of insufficient 
parameter settings"),
                                 errhint("You can restart the server after making 
the necessary configuration changes.")));

I think we should repeat here conflicted param_name and minValue. 
pg_wal_replay_resume can be called days after recovery being paused. The 
initial message can be difficult to find.

done


errmsg("recovery will be paused")

May be use the same "recovery has paused" as in recoveryPausesHere? It doesn't 
seem to make any difference since we set pause right after that, but there will be a 
little less work translators.

done

Not sure about "If recovery is unpaused". The word "resumed" seems to have been 
usually used in docs.

I think I like "unpaused" better here, because "resumed" would seem to imply that recovery can actually continue.

One thing that has not been added to my patch is the equivalent of 496ee647ecd2917369ffcf1eaa0b2cdca07c8730, which allows promotion while recovery is paused. I'm not sure that would be necessary, and it doesn't look easy to add either.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/
From 99724e2ee14b5f3ec926c7afdc056863a7e2294f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <pe...@eisentraut.org>
Date: Fri, 20 Nov 2020 13:39:09 +0100
Subject: [PATCH v5] Pause recovery for insufficient parameter settings

When certain parameters are changed on a physical replication primary,
this is communicated to standbys using the XLOG_PARAMETER_CHANGE WAL
record.  The standby then checks whether its own settings are at least
as big as the ones on the primary.  If not, the standby shuts down
with a fatal error.

This patch changes this behavior to pause recovery at that point
instead.  That allows read traffic on the standby to continue while
database administrators figure out next steps.  When recovery is
unpaused, the server shuts down (as before).  The idea is to fix the
parameters while recovery is paused and then restart when there is a
maintenance window.

Discussion: 
https://www.postgresql.org/message-id/flat/4ad69a4c-cc9b-0dfe-0352-8b1b0cd36...@2ndquadrant.com
---
 doc/src/sgml/high-availability.sgml | 48 +++++++++++++++++++++--------
 src/backend/access/transam/xlog.c   | 38 ++++++++++++++++++++---
 2 files changed, 69 insertions(+), 17 deletions(-)

diff --git a/doc/src/sgml/high-availability.sgml 
b/doc/src/sgml/high-availability.sgml
index 19d7bd2b28..e9a30dd88b 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -2120,18 +2120,14 @@ <title>Administrator's Overview</title>
    </para>
 
    <para>
-    The setting of some parameters on the standby will need reconfiguration
-    if they have been changed on the primary. For these parameters,
-    the value on the standby must
-    be equal to or greater than the value on the primary.
-    Therefore, if you want to increase these values, you should do so on all
-    standby servers first, before applying the changes to the primary server.
-    Conversely, if you want to decrease these values, you should do so on the
-    primary server first, before applying the changes to all standby servers.
-    If these parameters
-    are not set high enough then the standby will refuse to start.
-    Higher values can then be supplied and the server
-    restarted to begin recovery again.  These parameters are:
+    The settings of some parameters determine the size of shared memory for
+    tracking transaction IDs, locks, and prepared transactions.  These shared
+    memory structures must be no smaller on a standby than on the primary in
+    order to ensure that the standby does not run out of shared memory during
+    recovery.  For example, if the primary had used a prepared transaction but
+    the standby had not allocated any shared memory for tracking prepared
+    transactions, then recovery could not continue until the standby's
+    configuration is changed.  The parameters affected are:
 
       <itemizedlist>
        <listitem>
@@ -2160,6 +2156,34 @@ <title>Administrator's Overview</title>
         </para>
        </listitem>
       </itemizedlist>
+
+    The easiest way to ensure this does not become a problem is to have these
+    parameters set on the standbys to values equal to or greater than on the
+    primary.  Therefore, if you want to increase these values, you should do
+    so on all standby servers first, before applying the changes to the
+    primary server.  Conversely, if you want to decrease these values, you
+    should do so on the primary server first, before applying the changes to
+    all standby servers.  Keep in mind that when a standby is promoted, it
+    becomes the new reference for the required parameter settings for the
+    standbys that follow it.  Therefore, to avoid this becoming a problem
+    during a switchover or failover, it is recommended to keep these settings
+    the same on all standby servers.
+   </para>
+
+   <para>
+    The WAL tracks changes to these parameters on the
+    primary, and if a standby processes WAL that indicates that the current
+    value on the primary is higher than its own value, it will log a warning
+    and pause recovery, for example:
+<screen>
+WARNING:  hot standby is not possible because of insufficient parameter 
settings
+DETAIL:  max_connections = 80 is a lower setting than on the primary server, 
where its value was 100.
+LOG:  recovery has paused
+DETAIL:  If recovery is unpaused, the server will shut down.
+HINT:  You can then restart the server after making the necessary 
configuration changes.
+</screen>
+    At that point, the settings on the standby need to be updated and the
+    instance restarted before recovery can continue.
    </para>
 
    <para>
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 13f1d8c3dc..a3519d50e4 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6232,12 +6232,40 @@ static void
 RecoveryRequiresIntParameter(const char *param_name, int currValue, int 
minValue)
 {
        if (currValue < minValue)
-               ereport(ERROR,
+       {
+               ereport(WARNING,
                                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-                                errmsg("hot standby is not possible because %s 
= %d is a lower setting than on the primary server (its value was %d)",
-                                               param_name,
-                                               currValue,
-                                               minValue)));
+                                errmsg("hot standby is not possible because of 
insufficient parameter settings"),
+                                errdetail("%s = %d is a lower setting than on 
the primary server, where its value was %d.",
+                                                  param_name,
+                                                  currValue,
+                                                  minValue)));
+
+               SetRecoveryPause(true);
+
+               ereport(LOG,
+                               (errmsg("recovery has paused"),
+                                errdetail("If recovery is unpaused, the server 
will shut down."),
+                                errhint("You can then restart the server after 
making the necessary configuration changes.")));
+
+               while (RecoveryIsPaused())
+               {
+                       HandleStartupProcInterrupts();
+                       pgstat_report_wait_start(WAIT_EVENT_RECOVERY_PAUSE);
+                       pg_usleep(1000000L);    /* 1000 ms */
+                       pgstat_report_wait_end();
+               }
+
+               ereport(FATAL,
+                               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                                errmsg("recovery aborted because of 
insufficient parameter settings"),
+                                /* Repeat the detail from above so it's easy 
to find in the log. */
+                                errdetail("%s = %d is a lower setting than on 
the primary server, where its value was %d.",
+                                                  param_name,
+                                                  currValue,
+                                                  minValue),
+                                errhint("You can restart the server after 
making the necessary configuration changes.")));
+       }
 }
 
 /*
-- 
2.29.2

Reply via email to