On Tue, Mar 25, 2025 at 11:05 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > Hi, > > When testing the slot synchronization with logical replication slots that > enabled two_phase decoding, I found that transactions prepared before > two-phase > decoding is enabled may fail to replicate to the subscriber after being > committed on a promoted standby following a failover. > > To reproduce this issue, please follow these steps (also detailed in the > attached TAP test, v1-0001): > > 1. sub: create a subscription with (two_phase = false) > 2. primary (pub): prepare a txn A. > 3. sub: alter subscription set (two_phase = true) and wait for the logical > slot to > be synced to standby. > 4. primary (pub): stop primary, promote the standby and let the subscriber use > the promoted standby as publisher. > 5. promoted standby (pub): COMMIT PREPARED A; > 6. sub: the apply worker will report the following ERROR because it didn't > receive the PREPARE. > ERROR: prepared transaction with identifier "pg_gid_16387_752" does not > exist > > I think the root cause of this issue is that the two_phase_at field of the > slot, which indicates the LSN from which two-phase decoding is enabled (used > to > prevent duplicate data transmission for prepared transactions), is not > synchronized to the standby server. > > In step 3, transaction A is not immediately replicated because it occurred > before enabling two-phase decoding. Thus, the prepared transaction should only > be replicated after decoding the final COMMIT PREPARED, as referenced in > ReorderBufferFinishPrepared(). However, due to the invalid two_phase_at on the > standby, the prepared transaction fails to send at that time. > > This problem arises after the support for altering the two-phase option > (1462aad). >
Thanks for the report and patch. I'll look into it. -- With Regards, Amit Kapila.