Re: pg_upgrade and logical replication

Amit Kapila Wed, 14 Feb 2024 23:19:37 -0800

On Wed, Feb 14, 2024 at 9:07 AM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> > pg_upgrade/t/004_subscription.pl says
> >
> > |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
> >
> > ..but I think maybe it should not.
> >
> > When you try to use --link, it fails:
> > https://cirrus-ci.com/task/4669494061170688
> >
> > |Adding ".old" suffix to old global/pg_control                 ok
> > |
> > |If you want to start the old cluster, you will need to remove
> > |the ".old" suffix from
> > /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su
> > bscription_old_sub_data/pgdata/global/pg_control.old.
> > |Because "link" mode was used, the old cluster cannot be safely
> > |started once the new cluster has been started.
> > |...
> > |
> > |postgres: could not find the database system
> > |Expected to find it in the directory
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata",
> > |but could not open file
> > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> > ubscription_old_sub_data/pgdata/global/pg_control": No such file or 
> > directory
> > |# No postmaster PID for node "old_sub"
> > |[19:36:01.396](0.250s) Bail out!  pg_ctl start failed
> >
>
> Good catch! The primal reason of the failure is to reuse the old cluster, 
> even after
> the successful upgrade. The documentation said [1]:
>
> >
> If you use link mode, the upgrade will be much faster (no file copying) and 
> use less
> disk space, but you will not be able to access your old cluster once you 
> start the new
> cluster after the upgrade.
> >
>
> > You could rename pg_control.old to avoid that immediate error, but that 
> > doesn't
> > address the essential issue that "the old cluster cannot be safely started 
> > once
> > the new cluster has been started."
>
> Yeah, I agreed that it should be avoided to access to the old cluster after 
> the upgrade.
> IIUC, pg_upgrade would be run third times in 004_subscription.
>
> 1. successful upgrade
> 2. failure due to the insufficient max_replication_slot
> 3. failure because the pg_subscription_rel has 'd' state
>
> And old instance is reused in all of runs. Therefore, the most reasonable fix 
> is to
> change the ordering of tests, i.e., "successful upgrade" should be done at 
> last.
>


This sounds like a reasonable way to address the reported problem.
Justin, do let me know if you think otherwise?

Comment:
===========
*
-# Setup an enabled subscription to verify that the running status and failover
-# option are retained after the upgrade.
+# Setup a subscription to verify that the failover option are retained after
+# the upgrade.
 $publisher->safe_psql('postgres', "CREATE PUBLICATION regress_pub1");
 $old_sub->safe_psql('postgres',
- "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$connstr' PUBLICATION
regress_pub1 WITH (failover = true)"
+ "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$connstr' PUBLICATION
regress_pub1 WITH (failover = true, enabled = false)"
 );

I think it is better not to create a subscription in the early stage
which we wanted to use for the success case. Let's have separate
subscriptions for failure and success cases. I think that will avoid
the newly added ALTER statements in the patch.

-- 
With Regards,
Amit Kapila.

Re: pg_upgrade and logical replication

Reply via email to