From: Nikolay Samokhvalov <nik@postgres.ai>
Date: Mon, 10 Jul 2023 20:07:18 +0000
Subject: [PATCH] Improve major upgrade docs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Recipe involving pg_upgrade -k and rsync --size-only can be dangerous
– clarify what to do to avoid standby corruption.

Discussion: https://www.postgresql.org/message-id/flat/CAM527d8heqkjG5VrvjU3Xjsqxg41ufUyabD9QZccdAxnpbRH-Q%40mail.gmail.com
---
 doc/src/sgml/ref/pgupgrade.sgml | 34 +++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7816b4c6859..03f93c55e49 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
     </para>
 
     <para>
-     Streaming replication and log-shipping standby servers can
+     Streaming replication and log-shipping standby servers must
      remain running until a later step.
     </para>
    </step>
 
-   <step>
+   <step id="pgupgrade-step-prepare-standbys">
     <title>Prepare for standby server upgrades</title>
 
     <para>
-     If you are upgrading standby servers using methods outlined in section <xref
-     linkend="pgupgrade-step-replicas"/>, verify that the old standby
-     servers are caught up by running <application>pg_controldata</application>
-     against the old primary and standby clusters.  Verify that the
-     <quote>Latest checkpoint location</quote> values match in all clusters.
-     (There will be a mismatch if old standby servers were shut down
-     before the old primary or if the old standby servers are still running.)
+     If you are upgrading standby servers using methods outlined in 
+     <xref linkend="pgupgrade-step-replicas"/>, ensure that they were running when 
+     you shut down the primaries in the previous step, so all the latest changes 
+     and the shutdown checkpoint record were received. You can verify this by running 
+     <application>pg_controldata</application> against the old primary and standby 
+     clusters. The <quote>Latest checkpoint location</quote> values must match in all 
+     nodes. A mismatch might occur if old standby servers were shut down before 
+     the old primary. To fix a mismatch, start all old servers and return to the 
+     previous step; proceeding with mismatched 
+     <quote>Latest checkpoint location</quote> may lead to standby corruption.
+    </para>
+
+    <para>
      Also, make sure <varname>wal_level</varname> is not set to
      <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
      new primary cluster.
@@ -497,7 +503,6 @@ pg_upgrade.exe
      linkend="warm-standby"/>) standby servers, you can follow these steps to
      quickly upgrade them.  You will not be running <application>pg_upgrade</application> on
      the standby servers, but rather <application>rsync</application> on the primary.
-     Do not start any servers yet.
     </para>
 
     <para>
@@ -508,6 +513,15 @@ pg_upgrade.exe
      is running.
     </para>
 
+    <para>
+     Before running rsync, to avoid standby corruption, it is absolutely
+     critical to ensure that both primaries are shut down and standbys 
+     have received the last changes (see <xref linkend="pgupgrade-step-prepare-standbys"/>). 
+     Standbys can be running at this point or fully stopped. If they 
+     are still running, you can stop, upgrade, and start them one by one; this
+     can be useful to keep the cluster open for read-only transactions.
+    </para>
+
     <procedure>
 
      <step>
--