On Mon, Jul 8, 2024 at 12:34 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > > Another possible problem is related to my use case. I haven't reproduced > > this > > case, just some thoughts. I guess, when two_phase is ON, the PREPARE > > statement > > may be truncated from the WAL at checkpoint, but COMMIT PREPARED is still > > kept > > in the WAL. On catchup, I would ask the master to send transactions from > > some > > restart LSN. I would like to get all such transactions competely, with > > theirs > > bodies, not only COMMIT PREPARED messages. > > I don't think it is a real issue. WALs for prepared transactions will retain > until they are committed/aborted. > When the two_phase is on and transactions are PREPAREd, they will not be > cleaned up from the memory (See ReorderBufferProcessTXN()). Then, > RUNNING_XACT > record leads to update the restart_lsn of the slot but it cannot be move > forward > because ReorderBufferGetOldestTXN() returns the prepared transaction (See > SnapBuildProcessRunningXacts()). restart_decoding_lsn of each transaction, > which > is a candidate of restart_lsn of the slot. is always behind the startpoint of > its txn. >
I see that in 0003/0004, the patch first aborts pending prepared transactions, update's catalog, and then change slot's property via walrcv_alter_slot. What if there is any ERROR (say the remote node is not reachable or there is an error while updating the catalog) after we abort the pending prepared transaction? Won't we end up with lost prepared transactions in such a case? Few other comments: ================= The code to handle SUBOPT_TWOPHASE_COMMIT should be after failover option handling for the sake of code symmetry. Also, the checks should be in same order like first for slot_name, then enabled, then for PreventInTransactionBlock(), after those, we can have other checks for two_phase. If possible, we can move common checks in both failover and two_phase options into a common function. What should be the behavior if one tries to set slot_name to NONE and also tries to toggle two_pahse option? I feel both options together don't makes sense because there is no use in changing two_phase for some slot which we are disassociating the subscription from. The same could be said for the failover option as well, so if we agree with some different behavior here, we can follow the same for failover option as well. -- With Regards, Amit Kapila.