On Wed, Feb 14, 2024 at 9:07 AM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > > pg_upgrade/t/004_subscription.pl says > > > > |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy'; > > > > ..but I think maybe it should not. > > > > When you try to use --link, it fails: > > https://cirrus-ci.com/task/4669494061170688 > > > > |Adding ".old" suffix to old global/pg_control ok > > | > > |If you want to start the old cluster, you will need to remove > > |the ".old" suffix from > > /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su > > bscription_old_sub_data/pgdata/global/pg_control.old. > > |Because "link" mode was used, the old cluster cannot be safely > > |started once the new cluster has been started. > > |... > > | > > |postgres: could not find the database system > > |Expected to find it in the directory > > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s > > ubscription_old_sub_data/pgdata", > > |but could not open file > > "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s > > ubscription_old_sub_data/pgdata/global/pg_control": No such file or > > directory > > |# No postmaster PID for node "old_sub" > > |[19:36:01.396](0.250s) Bail out! pg_ctl start failed > > > > Good catch! The primal reason of the failure is to reuse the old cluster, > even after > the successful upgrade. The documentation said [1]: > > > > If you use link mode, the upgrade will be much faster (no file copying) and > use less > disk space, but you will not be able to access your old cluster once you > start the new > cluster after the upgrade. > > > > > You could rename pg_control.old to avoid that immediate error, but that > > doesn't > > address the essential issue that "the old cluster cannot be safely started > > once > > the new cluster has been started." > > Yeah, I agreed that it should be avoided to access to the old cluster after > the upgrade. > IIUC, pg_upgrade would be run third times in 004_subscription. > > 1. successful upgrade > 2. failure due to the insufficient max_replication_slot > 3. failure because the pg_subscription_rel has 'd' state > > And old instance is reused in all of runs. Therefore, the most reasonable fix > is to > change the ordering of tests, i.e., "successful upgrade" should be done at > last. >
This sounds like a reasonable way to address the reported problem. Justin, do let me know if you think otherwise? Comment: =========== * -# Setup an enabled subscription to verify that the running status and failover -# option are retained after the upgrade. +# Setup a subscription to verify that the failover option are retained after +# the upgrade. $publisher->safe_psql('postgres', "CREATE PUBLICATION regress_pub1"); $old_sub->safe_psql('postgres', - "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$connstr' PUBLICATION regress_pub1 WITH (failover = true)" + "CREATE SUBSCRIPTION regress_sub1 CONNECTION '$connstr' PUBLICATION regress_pub1 WITH (failover = true, enabled = false)" ); I think it is better not to create a subscription in the early stage which we wanted to use for the success case. Let's have separate subscriptions for failure and success cases. I think that will avoid the newly added ALTER statements in the patch. -- With Regards, Amit Kapila.