Hi, On Mon, Jul 10, 2023 at 01:36:39PM -0700, Nikolay Samokhvalov wrote: > On Fri, Jul 7, 2023 at 6:31 AM Stephen Frost <sfr...@snowman.net> wrote: > > * Nikolay Samokhvalov (n...@postgres.ai) wrote: > > > But this can happen with anyone who follows the procedure from the docs > > as > > > is and doesn't do any additional steps, because in step 9 "Prepare for > > > standby server upgrades": > > > > > > 1) there is no requirement to follow specific order to shut down the > > nodes > > > - "Streaming replication and log-shipping standby servers can remain > > > running until a later step" should probably be changed to a > > > requirement-like "keep them running" > > > > Agreed that it would be good to clarify that the primary should be shut > > down first, to make sure everything written by the primary has been > > replicated to all of the replicas. > > Thanks! > > Here is a patch to fix the existing procedure description.
Thanks for that! > I agree with Andrey – without it, we don't have any good way to upgrade > large clusters in short time. Default rsync mode (without "--size-only") > takes a lot of time too, if the load is heavy. > > With these adjustments, can "rsync --size-only" remain in the docs as the > *fast* and safe method to upgrade standbys, or there are still some > concerns related to corruption risks? I hope somebody can answer that definitively, but I read Stephen's mail to indicate that this procedure should be safe in principle (if you know what you are doing). > From: Nikolay Samokhvalov <n...@postgres.ai> > Date: Mon, 10 Jul 2023 20:07:18 +0000 > Subject: [PATCH] Improve major upgrade docs Maybe mention standby here, like "Improve major upgrade documentation for standby servers". > +++ b/doc/src/sgml/ref/pgupgrade.sgml > @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion; > </para> > > <para> > - Streaming replication and log-shipping standby servers can > + Streaming replication and log-shipping standby servers must > remain running until a later step. > </para> > </step> > > - <step> > + <step id="pgupgrade-step-prepare-standbys"> > > <para> > - If you are upgrading standby servers using methods outlined in section > <xref > - linkend="pgupgrade-step-replicas"/>, verify that the old standby > - servers are caught up by running > <application>pg_controldata</application> > - against the old primary and standby clusters. Verify that the > - <quote>Latest checkpoint location</quote> values match in all clusters. > - (There will be a mismatch if old standby servers were shut down > - before the old primary or if the old standby servers are still running.) > + If you are upgrading standby servers using methods outlined in > + <xref linkend="pgupgrade-step-replicas"/>, You dropped the "section" before the xref, I think that should be kept around. > + ensure that they were > running when > + you shut down the primaries in the previous step, so all the latest > changes You talk of primaries in plural here, that is a bit weird for PostgreSQL documentation. > + and the shutdown checkpoint record were received. You can verify this > by running > + <application>pg_controldata</application> against the old primary and > standby > + clusters. The <quote>Latest checkpoint location</quote> values must > match in all > + nodes. A mismatch might occur if old standby servers were shut down > before > + the old primary. To fix a mismatch, start all old servers and return to > the > + previous step; proceeding with mismatched > + <quote>Latest checkpoint location</quote> may lead to standby > corruption. > + </para> > + > + <para> > Also, make sure <varname>wal_level</varname> is not set to > <literal>minimal</literal> in the <filename>postgresql.conf</filename> > file on the > new primary cluster. > @@ -497,7 +503,6 @@ pg_upgrade.exe > linkend="warm-standby"/>) standby servers, you can follow these steps to > quickly upgrade them. You will not be running > <application>pg_upgrade</application> on > the standby servers, but rather <application>rsync</application> on the > primary. > - Do not start any servers yet. > </para> > > <para> > @@ -508,6 +513,15 @@ pg_upgrade.exe > is running. > </para> > > + <para> > + Before running rsync, to avoid standby corruption, it is absolutely > + critical to ensure that both primaries are shut down and standbys > + have received the last changes (see <xref > linkend="pgupgrade-step-prepare-standbys"/>). I think this should be something like "ensure both that the primary is shut down and that the standbys have received all the changes". > + Standbys can be running at this point or fully stopped. "or be fully stopped." I think. > + If they > + are still running, you can stop, upgrade, and start them one by one; > this > + can be useful to keep the cluster open for read-only transactions. > + </para> Maybe this is clear from the context, but "upgrade" in the above should maybe more explicitly refer to the rsync method or else people might think one can run pg_upgrade on them after all? Michael