Notwithstanding the test errors I am getting for v24-0003, here are some code review comments for this patch anyway.
====== src/bin/pg_upgrade/check.c 1. check_for_lost_slots + +/* + * Verify that all logical replication slots are usable. + */ +void +check_for_lost_slots(ClusterInfo *cluster) 1a. AFAIK we don't ever need to call this also for 'new_cluster'. So the function should have no parameter and just access 'old_cluster' directly. ~ 1b. Can't this be a static function now? ~ 2. + for (i = 0; i < ntups; i++) + pg_log(PG_WARNING, + "\nWARNING: logical replication slot \"%s\" is in 'lost' state.", + PQgetvalue(res, i, i_slotname)); Is it correct that this message also includes the word "WARNING"? Other PG_WARNING messages don't do that. ~~~ 3. check_for_confirmed_flush_lsn +/* + * Verify that all logical replication slots consumed all WALs, except a + * CHECKPOINT_SHUTDOWN record. + */ +static void +check_for_confirmed_flush_lsn(ClusterInfo *cluster) AFAIK we don't ever need to call this also for 'new_cluster'. So the function should have no parameter and just access 'old_cluster' directly. ~ 4. + for (i = 0; i < ntups; i++) + pg_log(PG_WARNING, + "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet", + PQgetvalue(res, i, i_slotname)); Is it correct that this message also includes the word "WARNING"? Other PG_WARNING messages don't do that. ====== src/bin/pg_upgrade/controldata.c 5. get_control_data + else if ((p = strstr(bufin, "Latest checkpoint location:")) != NULL) + { + /* + * Gather the latest checkpoint location if the cluster is PG17 + * or later. This is used for upgrading logical replication + * slots. + */ + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) But we are not "gathering" anything. It's just one LSN. I think this ought to just say "Read the latest..." ~ 6. + /* + * The upper and lower part of LSN must be read separately + * because it is reported in %X/%X format. + */ /reported/stored as/ ====== src/bin/pg_upgrade/pg_upgrade.h 7. +void check_for_lost_slots(ClusterInfo *cluster);\ Why is this needed here? Can't this be a static function? ====== .../t/003_logical_replication_slots.pl 8. +# 2. Consume WAL records to avoid another type of upgrade failure. It will be +# tested in subsequent cases. +$old_publisher->safe_psql('postgres', + "SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL, NULL);" +); I wondered if that step really needed. Why will there be WAL records to consume? IIUC we haven't published anything yet. ~~~ 9. +# ------------------------------ +# TEST: Successful upgrade + +# Preparations for the subsequent test: +# 1. Remove the remained slot +$old_publisher->start; +$old_publisher->safe_psql('postgres', + "SELECT * FROM pg_drop_replication_slot('test_slot1');" +); Should removal of the slot be done as part of the cleanup of the previous test, instead of preparing for this one? ~~~ 10. # 3. Disable the subscription once $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub DISABLE"); $old_publisher->stop; 10a. What do you mean by "once"? ~ 10b. That old_publisher->stop; seems strangely placed. Why is it here? ~~~ 11. # Check that the slot 'test_slot1' has migrated to the new cluster $new_publisher->start; my $result = $new_publisher->safe_psql('postgres', "SELECT slot_name, two_phase FROM pg_replication_slots"); is($result, qq(sub|t), 'check the slot exists on new cluster'); ~ That comment now seems wrong. That slot was previously removed, right? ~~~ 12. # Update the connection my $new_connstr = $new_publisher->connstr . ' dbname=postgres'; $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub CONNECTION '$new_connstr'"); $subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE"); ~ Maybe better to combine both SQL. ------ Kind Regards, Peter Smith. Fujitsu Australia