Hi Kuroda-san. Here are some review comments for v28-0003.
====== src/bin/pg_upgrade/check.c 1. check_and_dump_old_cluster + /* + * Logical replication slots can be migrated since PG17. See comments atop + * get_old_cluster_logical_slot_infos(). + */ + if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700) + check_old_cluster_for_valid_slots(live_check); + IIUC we are preferring to use the <= 1600 style of version check instead of >= 1700 where possible. So this comment and version check ought to be removed from here, and done inside check_old_cluster_for_valid_slots() instead. ~~~ 2. check_old_cluster_for_valid_slots +/* + * check_old_cluster_for_valid_slots() + * + * Make sure logical replication slots can be migrated to new cluster. + * Following points are checked: + * + * - All logical replication slots are usable. + * - All logical replication slots consumed all WALs, except a + * CHECKPOINT_SHUTDOWN record. + */ +static void +check_old_cluster_for_valid_slots(bool live_check) I suggested in the previous comment above (#1) that the version check should be moved into this function. Therefore, this function comment now should also mention slot upgrade is only allowed for >= PG17 ~~~ 3. +static void +check_old_cluster_for_valid_slots(bool live_check) +{ + int i, + ntups, + i_slotname; + PGresult *res; + DbInfo *active_db = &old_cluster.dbarr.dbs[0]; + PGconn *conn; + + /* Quick exit if the cluster does not have logical slots. */ + if (count_old_cluster_logical_slots() == 0) + return; 3a. See comment #1. At the top of this function body there should be a version check like: if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) return; ~ 3b. /Quick exit/Quick return/ ~ 4. + prep_status("Checking for logical replication slots"); I felt that should add the word "valid" like: "Checking for valid logical replication slots" ~~~ 5. + /* Check there are no logical replication slots with a 'lost' state. */ + res = executeQueryOrDie(conn, + "SELECT slot_name FROM pg_catalog.pg_replication_slots " + "WHERE wal_status = 'lost' AND " + "temporary IS FALSE;"); Since the SQL is checking if there *are* lost slots I felt it would be more natural to reverse that comment. SUGGESTION /* Check and reject if there are any logical replication slots with a 'lost' state. */ ~~~ 6. + /* + * Do additional checks if a live check is not required. This requires + * that confirmed_flush_lsn of all the slots is the same as the latest + * checkpoint location, but it would be satisfied only when the server has + * been shut down. + */ + if (!live_check) I think the comment can be rearranged slightly: SUGGESTION Do additional checks to ensure that 'confirmed_flush_lsn' of all the slots is the same as the latest checkpoint location. Note: This can be satisfied only when the old_cluster has been shut down, so we skip this for "live" checks. ====== src/bin/pg_upgrade/controldata.c 7. + /* + * Read the latest checkpoint location if the cluster is PG17 + * or later. This is used for upgrading logical replication + * slots. + */ + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700) + { Fetching this "Latest checkpoint location:" value is only needed for the check_old_cluster_for_valid_slots validation check, isn't it? But AFAICT this code is common for both old_cluster and new_cluster. I am not sure what is best to do: - Do only the minimal logic needed? - Read the value redundantly even for new_cluster just to keep code simpler? Either way, maybe the comment should say something about this. ====== .../t/003_logical_replication_slots.pl 8. Consider adding one more test Maybe there should also be some "live check" test performed (e.g. using --check, and a running old_cluster). This would demonstrate pg_upgrade working successfully even when the WAL records are not consumed (because LSN checks would be skipped in check_old_cluster_for_valid_slots function). ------ Kind Regards, Peter Smith. Fujitsu Australia