On Fri, Aug 8, 2025 at 7:01 PM Fabrice Chapuis <fabrice636...@gmail.com> wrote: > > Thanks Shveta for coming on this point again and fixing the link. > The idea is to check if the slot has same name to try to resynchronize it > with the primary. > ok the check on the failover status for the remote slot is perhaps redundant. > I'm not sure what impact setting the synced flag to true might have. But if > you run an additional switchover, it works fine because the synced flag on > the new primary is set to true now. > If we come back to the idea of the GUC or the API, adding an allow_overwrite > parameter to the pg_create_logical_replication_slot function and removing the > logical slot when set to true could be a suitable approach. > > What is your opinion? >
If implemented as a GUC, it would address only a specific corner case, making it less suitable to be added as a GUC. OTOH, adding it as a slot's property makes more sense. You can start with introducing a new slot property, allow_overwrite. By default, this property will be set to false. a) The function pg_create_logical_replication_slot() can be extended to accept this parameter. b) A new API pg_alter_logical_replication_slot() can be introduced, to modify this property after slot creation if needed. c) The commands CREATE SUBSCRIPTION and ALTER SUBSCRIPTION are not needed to include an allow_overwrite parameter. When CREATE SUBSCRIPTION creates a slot, it will always set allow_overwrite to false by default. If users need to change this later, they can use the new API pg_alter_logical_replication_slot() to update the property. d) Additionally, pg_alter_logical_replication_slot() can serve as a generic API to modify other slot properties as well. This appears to be a reasonable idea with potential use cases beyond just allowing synchronization post switchover. Thoughts? ~~~ Another problem as you pointed out is inconsistent behaviour across switchovers. On the first switchover, we get the error on new standby: "Exiting from slot synchronization because a slot with the same name already exists on the standby." But in the case of a double switchover, this error does not occur. This is due to the 'synced' flag not set on new standby on first switchover while set in double switchover. I think the behaviour should be the same. In both cases, it should emit the same error. We are thinking of a potential solution here and will start a new thread if needed. thanks Shveta