On Thu, Apr 3, 2025 at 7:50 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Thu, Apr 3, 2025 at 3:30 AM Masahiko Sawada wrote: > > > > > On Wed, Apr 2, 2025 at 6:33 AM Zhijie Hou (Fujitsu) > > <houzj.f...@fujitsu.com> wrote: > > > > Thank you for the explanation! I agree that the issue happens in these > > cases. > > > > As another idea, I wonder if we could somehow defer to make the synced > > slot as 'sync-ready' until we can ensure that the slot doesn't have > > any transactions that are prepared before the point of enabling > > two_phase. For example, when the slotsync worker fetches the remote > > slot, it remembers the confirmed_flush_lsn (say LSN-1) if the local > > slot's two_phase becomes true or the local slot is newly created with > > enabling two_phase, and then it makes the slot 'sync-ready' once it > > confirmed that the slot's restart_lsn passed LSN-1. Does it work? > > Thanks for the idea! > > We considered a similar approach in [1] to confirm there is no prepared > transactions before two_phase_at, but the issue is that when the two_phase > flag > is switched from 'false' to 'true' (as in the case with (copy_data=true, > failover=true, two_phase=true)). In this case, the slot may have already been > marked as sync-ready before the two_phase flag is enabled, as slotsync is > unaware of potential future changes to the two_phase flag. >
This can happen because when copy_data is true, tablesync can take a long time to complete the sync and in the meantime, slot without a two_phase flag would have been synced to standby. Such a slot would be marked as sync-ready even if we follow the calculation proposed by Sawada-san. Note that we enable two_phase once all the tables are in ready state (See run_apply_worker() and comments atop worker.c (TWO_PHASE TRANSACTIONS)). -- With Regards, Amit Kapila.