On Fri, Sep 8, 2023 at 6:31 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote:
Comments on the latest patch. 1. Note that slot restoration must be done after the final pg_resetwal command during the upgrade because pg_resetwal will remove WALs that are required by the slots. Due to this restriction, the timing of restoring replication slots is different from other objects. This comment in the commit message is confusing. I understand the reason but from this, it is not very clear that if resetwal removes the WAL we needed then why it is good to create after the resetwal. I think we should make it clear that creating new slot will set the restart lsn to current WAL location and after that resetwal can remove those WAL where slot restart lsn is pointing.... 2. + <itemizedlist> + <listitem> + <para> + All slots on the old cluster must be usable, i.e., there are no slots + whose + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>wal_status</structfield> + is <literal>lost</literal>. + </para> + </listitem> + <listitem> + <para> + <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>confirmed_flush_lsn</structfield> + of all slots on the old cluster must be the same as the latest + checkpoint location. + </para> + </listitem> + <listitem> + <para> + The output plugins referenced by the slots on the old cluster must be + installed in the new PostgreSQL executable directory. + </para> + </listitem> + <listitem> + <para> + The new cluster must have + <link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link> + configured to a value greater than or equal to the number of slots + present in the old cluster. + </para> + </listitem> + <listitem> + <para> + The new cluster must have + <link linkend="guc-wal-level"><varname>wal_level</varname></link> as + <literal>logical</literal>. + </para> + </listitem> + </itemizedlist> I think we should also add that the new slot should not have any permanent existing logical replication slot. 3. - with the primary.) Replication slots are not copied and must - be recreated. + with the primary.) Replication slots on the old standby are not copied. + Only logical slots on the primary are migrated to the new standby, + and other slots must be recreated. This paragraph should be rephrased. I mean first stating that "Replication slots on the old standby are not copied" and then saying Only logical slots are migrated doesn't seem like the best way. Maybe we can just say "Only logical slots on the primary are migrated to the new standby, and other slots must be recreated." 4. + /* + * Raise an ERROR if the logical replication slot is invalidating. It + * would not happen because max_slot_wal_keep_size is set to -1 during + * the upgrade, but it stays safe. + */ + if (*invalidated && SlotIsLogical(s) && IsBinaryUpgrade) + elog(ERROR, "Replication slots must not be invalidated during the upgrade."); Rephrase the first line as -> Raise an ERROR if the logical replication slot is invalidating during an upgrade. 5. + /* Logical slots can be migrated since PG17. */ + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600) + return; For readability change this to if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700), because in most of the checks related to this, we are using 1700 so better be consistent in this. 6. + if (nslots_on_new) + pg_fatal(ngettext("New cluster must not have logical replication slots but found %d slot.", + "New cluster must not have logical replication slots but found %d slots.", + nslots_on_new), + nslots_on_new); ... + if (PQntuples(res) != 1) + pg_fatal("could not determine wal_level."); + + wal_level = PQgetvalue(res, 0, 0); + + if (strcmp(wal_level, "logical") != 0) + pg_fatal("wal_level must be \"logical\", but is set to \"%s\"", + wal_level); I have noticed that the case of the first letter in the pg_fatal message is not consistent. 7. + + /* Is the slot still usable? */ + if (slot->invalid) + { Why comment says "Is the slot still usable?" I think it should be "Is the slot usable?" otherwise it appears that we have first fetched the slots and now we are refetching it and checking whether it is still usable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com