Here are some review comments for patch v61-0002 ====== doc/src/sgml/logical-replication.sgml
1. + <sect2 id="logical-replication-failover-examples"> + <title>Examples: logical replication failover</title> The current documentation structure (after the patch is applied) looks like this: 30.1. Publication 30.2. Subscription 30.2.1. Replication Slot Management 30.2.2. Examples: Set Up Logical Replication 30.2.3. Examples: Deferred Replication Slot Creation 30.2.4. Examples: logical replication failover I don't think it is ideal. Firstly, I think this new section is not just "Examples:"; it is more like instructions for steps to check if a successful failover is possible. IMO call it something like "Logical Replication Failover" or "Replication Slot Failover". Secondly, I don't think this new section strictly belongs underneath the "Subscription" section anymore because IMO it is just as much about the promotion of the publications. Now that you are adding this new (2nd) section about slots, I think the whole structure of this document should be changed like below: SUGGESTION #1 (make a new section 30.3 just for slot-related topics) 30.1. Publication 30.2. Subscription 30.2.1. Examples: Set Up Logical Replication 30.3. Logical Replication Slots 30.3.1. Replication Slot Management 30.3.2. Examples: Deferred Replication Slot Creation 30.3.3. Logical Replication Failover ~ SUGGESTION #2 (keep the existing structure, but give the failover its own new section 30.3) 30.1. Publication 30.2. Subscription 30.2.1. Replication Slot Management 30.2.2. Examples: Set Up Logical Replication 30.2.3. Examples: Deferred Replication Slot Creation 30.3 Logical Replication Failover ~ SUGGESTION #2a (and maybe later you can extract some of the failover examples further) 30.1. Publication 30.2. Subscription 30.2.1. Replication Slot Management 30.2.2. Examples: Set Up Logical Replication 30.2.3. Examples: Deferred Replication Slot Creation 30.3 Logical Replication Failover 30.3.1. Examples: Checking if failover ready ~~~ 2. + <para> + In a logical replication setup, if the publisher server is also the primary + server of the streaming replication, the logical slots on the primary server + can be synchronized to the standby server by specifying <literal>failover = true</literal> + when creating the subscription. Enabling failover ensures a seamless + transition of the subscription to the promoted standby, allowing it to + subscribe to the new primary server without any data loss. + </para> I was initially confused by the wording. How about like below: SUGGESTION When the publisher server is the primary server of a streaming replication, the logical slots on that primary server can be synchronized to the standby server by specifying <literal>failover = true</literal> when creating subscriptions for those publications. Enabling failover ensures a seamless transition of those subscriptions after the standby is promoted. They can continue subscribing to publications now on the new primary server without any data loss. ~~~ 3. + <para> + However, the replication slots are copied asynchronously, which means it's necessary + to confirm that replication slots have been synced to the standby server + before the failover happens. Additionally, to ensure a successful failover, + the standby server must not lag behind the subscriber. To confirm + that the standby server is ready for failover, follow these steps: + </para> Minor rewording SUGGESTION Because the slot synchronization logic copies asynchronously, it is necessary to confirm that replication slots have been synced to the standby server before the failover happens. Furthermore, to ensure a successful failover, the standby server must not be lagging behind the subscriber. To confirm that the standby server is indeed ready for failover, follow these 2 steps: ~~~ 4. The instructions said "follow these steps", so the next parts should be rendered as 2 "steps" (using <procedure> markup?) SUGGESTION (show as steps 1,2 and also some minor rewording of the step heading) 1. Confirm that all the necessary logical replication slots have been synced to the standby server. 2. Confirm that the standby server is not lagging behind the subscribers. ~~~ 5. + <para> + Check if all the necessary logical replication slots have been synced to + the standby server. + </para> SUGGESTION Confirm that all the necessary logical replication slots have been synced to the standby server. ~~~ 6. + <listitem> + <para> + On logical subscriber, fetch the slot names that should be synced to the + standby that we plan to promote. SUGGESTION Firstly, on the subscriber node, use the following SQL to identify the slot names that should be... ~~~ 7. +<programlisting> +test_sub=# SELECT + array_agg(slotname) AS slots + FROM + (( + SELECT r.srsubid AS subid, CONCAT('pg_' || srsubid || '_sync_' || srrelid || '_' || ctl.system_identifier) AS slotname + FROM pg_control_system() ctl, pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate = 'f' AND s.oid = r.srsubid AND s.subfailover + ) UNION ( + SELECT oid AS subid, subslotname as slotname + FROM pg_subscription + WHERE subfailover + )); 7a Maybe this ought to include "pg_catalog" schemas? ~ 7b. For consistency, maybe it is better to use a table alias "FROM pg_subscription s" in the UNION also ~~~ 8. + <listitem> + <para> + Check that the logical replication slots exist on the standby server. SUGGESTION Next, check that the logical replication slots identified above exist on the standby server. ~~~ 9. +<programlisting> +test_standby=# SELECT bool_and(synced AND NOT temporary AND conflict_reason IS NULL) AS failover_ready + FROM pg_replication_slots + WHERE slot_name in ('slots'); + failover_ready +---------------- + t 9a. Maybe this ought to include "pg_catalog" schemas? ~ 9b. IIUC that 'slots' reference is supposed to be those names that were found in the prior step. If so, then that point needs to be made clear, and anyway in this case 'slots' is not compatible with the 'sub' name returned by your first SQL. ~~~ 10. + <listitem> + <para> + Query the last replayed WAL on the logical subscriber. SUGGESTION Firstly, on the subscriber node check the last replayed WAL. ~~~ 11. +<programlisting> +test_sub=# SELECT + MAX(remote_lsn) AS remote_lsn_on_subscriber + FROM + (( + SELECT (CASE WHEN r.srsubstate = 'f' THEN pg_replication_origin_progress(CONCAT('pg_' || r.srsubid || '_' || r.srrelid), false) + WHEN r.srsubstate = 's' THEN r.srsublsn END) as remote_lsn + FROM pg_subscription_rel r, pg_subscription s + WHERE r.srsubstate IN ('f', 's') AND s.oid = r.srsubid AND s.subfailover + ) UNION ( + SELECT pg_replication_origin_progress(CONCAT('pg_' || s.oid), false) AS remote_lsn + FROM pg_subscription s + WHERE subfailover + )); 11a. Maybe this ought to include "pg_catalog" schemas? ~ 11b. /WHERE subfailover/WHERE s.subfailover/ ~~~ 12. + <listitem> + <para> + On the standby server, check that the last-received WAL location + is ahead of the replayed WAL location on the subscriber. SUGGESTION Next, on the standby server check that the last-received WAL location is ahead of the replayed WAL location on the subscriber identified above. ~~~ 13. +</programlisting></para> + </listitem> + <listitem> + <para> + On the standby server, check that the last-received WAL location + is ahead of the replayed WAL location on the subscriber. +<programlisting> +test_standby=# SELECT pg_last_wal_receive_lsn() >= 'remote_lsn_on_subscriber'::pg_lsn AS failover_ready; + failover_ready +---------------- + t IIUC the 'remote_lsn_on_subscriber' is supposed to represent the substitution of the value found in the subscriber server. In this example maybe it would be: SELECT pg_last_wal_receive_lsn() >= '0/3000388'::pg_lsn AS failover_ready; maybe that point can be made more clearly. ~~~ 14. + <para> + If the result (failover_ready) of both above steps is true, it means it is + okay to subscribe to the standby server. + </para> 14a. failover_ready should be rendered as literal. ~ 14b. Does this say what you intended, or did you mean something more like "the standby can be promoted and existing subscriptions will be able to continue without data loss" ====== src/backend/replication/logical/slotsync.c 15. local_slot_update +/* + * Try to update local slot metadata based on the data from the remote slot. + * + * Return false if the data of the remote slot is the same as the local slot. + * Otherwise, return true. + */ There's not really any "try to" here; it either does it if needed or doesn't do it because it's not needed. SUGGESTION If necessary, update local slot metadata based on the data from the remote slot. If no update was needed (the data of the remote slot is the same as the local slot) return false, otherwise true. ~~~ 16. + bool updated_xmin; + bool updated_restart; + Oid dbid; + ReplicationSlot *slot = MyReplicationSlot; + + Assert(slot->data.invalidated == RS_INVAL_NONE); + + updated_xmin = (remote_slot->catalog_xmin != slot->data.catalog_xmin); + updated_restart = (remote_slot->restart_lsn != slot->data.restart_lsn); + dbid = get_database_oid(remote_slot->database, false); + + if (namestrcmp(&slot->data.plugin, remote_slot->plugin) == 0 && + slot->data.database == dbid && !updated_restart && !updated_xmin && + remote_slot->two_phase == slot->data.two_phase && + remote_slot->failover == slot->data.failover && + remote_slot->confirmed_lsn == slot->data.confirmed_flush) + return false; It seems a bit strange to have boolean flags for some of the differences (updated_xmin, updated_restart) but not for the others. I expected it should be for all (e.g. updated_twophase, updated_failover, ...) or none of them. ~~~ 17. synchronize_one_slot + slot_updated = local_slot_update(remote_slot); + + /* Make sure the slot changes persist across server restart */ + if (slot_updated) + { + ReplicationSlotMarkDirty(); + ReplicationSlotSave(); + } IMO this code would be simpler if written like below because then 'slot_updated' is only ever assigned when true instead of maybe overwriting the default again with false: SUGGESTION /* Make sure the slot changes persist across server restart */ if (local_slot_update(remote_slot)) { slot_updated = true; ReplicationSlotMarkDirty(); ReplicationSlotSave(); } ====== src/backend/replication/slot.c 18. ReplicationSlotPersist - TEMPORARY v EPHEMERAL I noticed this ReplicationSlotPersist() from v59-0002 was reverted: - * Convert a slot that's marked as RS_EPHEMERAL to a RS_PERSISTENT slot, - * guaranteeing it will be there after an eventual crash. + * Convert a slot that's marked as RS_EPHEMERAL or RS_TEMPORARY to a + * RS_PERSISTENT slot, guaranteeing it will be there after an eventual crash. AFAIK in v61 you are still calling this function with RS_TEMPORARY which is now contrary to the current function comment if you don't change it to also mention RS_TEMPORARY. ====== Kind Regards, Peter Smith. Fujitsu Australia