On Fri, Apr 18, 2025 at 9:58 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Thu, Apr 17, 2025 at 6:14 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > > > > ----- > > Fix > > ----- > > > > I think we should keep the confirmed_flush even if the previous synced > > restart_lsn/catalog_xmin is newer. Attachments include a patch for the same. > > > > This will fix the case we are facing but adds a new rule for slot > synchronization. Can we think of a simpler way to fix this by avoiding > updating other slot fields (like two_phase, two_phase_at) if > restart_lsn or catalog_xmin of the local slot is ahead of the remote > slot? >
Thinking more about this problem, it seems to me that if the catalog_xmin of synced slot is allowed to be ahead than the remote_slot when there is still an open (prepared transaction), it could cause data loss. I mean that after the promotion, some of the required catalog rows could be removed, and decoding corresponding changes (changes from tables affected by DDL) could give unexpected results. Those would be protected on primary/publisher because the catalog_xmin on it was still accurate and behind. If this theory turns out to be true, then this is a drawback/bug of the existing fast_forward mode code. -- With Regards, Amit Kapila.