On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > I realized that users who create a logical slot using > > pg_create_logical_replication_slot() would not be able to enable both > > options at slot creation, and there is no easy way to enable the > > failover after two_phase-enabled-slot creation. Users would need to > > use ALTER_REPLICATION_SLOT replication command, which seems > > unrealistics for users to use. On the other hand, if we allow creating > > a logical slot with enabling failover and two_phase using SQL API, > > there is still a chance for this bug to occur. Would it be worth > > considering that if a logical slot is created with enabling failover > > and two_phase using SQL API, we create the slot with only > > two_phase=true, then advance the slot until the slot satisfies > > restart_lsn >= two_phase_at, and then enable the failover? > > > > This means we either need to maintain somewhere that user has provided > failover flag till restart_lsn >= two_phase_at or and then set > failover flag in the slot
I was thinking of this idea. > or initially mark it but enable the > functionality of failover when we reach the condition restart_lsn >= > two_phase_at. IIUC the slot could be synchronized to the standby as soon as we complete DecodingContextFindStartpoint() for a failover-enabled slot. So we would need some mechanisms to make sure that the slot is not synchronized while we're waiting to reach the condition restart_lsn >= two_phase_at even if the failover is enabled. > Both seem to have different kinds of problems. The first > idea seems to have an issue with persistence, which means we can lose > track of the flag after the restart. I think we can do this series of operations while the slot is not persistent, that is the slot is still RS_EPHEMERAL. > The second can mislead the user > for a long period in cases where prepare and commit have a large time > gap. I feel this will introduce complexity either in the form of code > or in giving the information to the user. Agreed. Both ways introduce complexity so we need to consider the user-unfriendliness (by not having a proper way to enable failover for the two_phase-enabled-slot using SQL API) vs. risk (of introducing complexity). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com