On Fri, May 30, 2025 at 12:55 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Thu, May 29, 2025 at 2:00 AM Nisha Moond <nisha.moond...@gmail.com> wrote: > > > > On Wed, May 28, 2025 at 6:10 AM Masahiko Sawada <sawada.m...@gmail.com> > > wrote: > > > > > > On Sun, May 25, 2025 at 11:34 PM Nisha Moond <nisha.moond...@gmail.com> > > > wrote: > > > > > > > > to > > > > On Fri, May 23, 2025 at 10:06 PM Masahiko Sawada > > > > <sawada.m...@gmail.com> wrote: > > > > > > > > > > > > > > > Here are review comments for v14 patch: > > > > > > > > > > > > > Thank you for the review. > > > > > > > > > I think we need to include a basic test case where we simply create a > > > > > subscription with two_phase=true and then enable the failover via > > > > > ALTER SUBSCRIPTION after making sure that slot's restart_lsn passed > > > > > two_phase_at. Probably if we use this case in 003_logical_slots.pl, we > > > > > can avoid creating regress_sub2? > > > > > > > > > > > > > A test has been added in 040_standby_failover_slots_sync.pl to enable > > > > failover via ALTER SUBSCRIPTION. > > > > > > Yes but the slot is originally created via SQL API, which seems > > > uncommon usage in practice. I thought it would be good to have the > > > basic steps in the tests to enable both fields. > > > > > > > Apologies for the earlier misunderstanding. I've updated > > 040_standby_failover_slots_sync.pl to modify the slot creation via the > > CREATE SUBSCRIPTION command as suggested. > > > > > > Regarding regress_sub2 in 003_logical_slots.pl : > > > > If we use ALTER SUBSCRIPTION to set failover=true on regress_sub, it > > > > leads to pg_upgrade failure, as it attempts to create a slot on the > > > > new_node(upgraded version) with both two_phase and failover enabled, > > > > which is an unsupported combination. > > > > > > I think that the pg_upgrade test should cover the case where it > > > restores logical slots with both fields enabled in the new cluster. > > > When I actually tried this case, I realized that pg_upgrade doesn't > > > handle this case; it tried to create the logical slot via SQL API but > > > it failed as we don't allow it to create a logical slot with enabling > > > both two_phase and failover. How can we handle this case? > > > > > > > On further analysis, it seems feasible and safe to allow creation of > > slot and subscription with both two_phase and failover enabled during > > upgrade (pg_upgrade). > > As before starting the upgrade, pg_upgrade ensures that - > > a) All slot changes have been fully consumed. > > b) No prepared transactions exist. > > Additionally, it is also documented[1] - "Obviously, no one should be > > accessing the clusters during the upgrade." > > > > Though, if allowed, the slot may be created with a restart_lsn < > > two_phase_at. IIUC, the guarantees above make it impossible for a > > prepared transaction to exist before two_phase_at, thus preventing the > > bug case. > > True, but I think we need to cover the simple pg_dump and pg_restore > case too, without pg_upgrade. Otherwise the dump file won't work if > the database has at least one subscription with two_phase and failover > enabled. >
Agree that we need to cover the simple pg_dump and pg_restore with the patch. When pg_dump and pg_restore are used outside of pg_upgrade, there's no guarantee that the target system does not have any prepared transactions. In such cases, restoring a subscription with both two_phase and failover enabled could lead to the bug, so we should avoid allowing both options via pg_restore. Here are a few possible solutions: 1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore : first create the sub with two_phase: - CREATE SUBSCRIPTION ... with (two_phase = on) then enable failover - ALTER SUBSCRIPTION ... with (failover=true) However, this approach adds complexity in pg_restore and may still fail if the slot's restart_lsn is earlier than two_phase_at. 2) Prevent dump of subscription in non-upgrade mode: Raise an error/warning during pg_dump when a subscription with both two_phase and failover is encountered (unless in binary upgrade mode). That is, either fail the pg_dump, or skip dumping subscriptions in such a case with a warning. We can include a helpful hint: "Disable failover for subscription <sub-name> before dumping and re-enable it after restore." 3) Dump such subscriptions with (two_phase=on, failover=on, create_slot=false) together. As pg_dump always sets "connect=false" for subscription, it is up to users to reactivate the subscription suitably. In this case, we can add some document to warn that it's the user's responsibility to ensure the remote slot is in a safe state. What do you think? Please let me know if you have any other suggestions for handling this in pg_dump/pg_restore. -- Thanks & Regards, Nisha Moond