Thanks for your reply. The problem I see is that after creating a new subscription, we have:
1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no problem. 2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set to false then the error message is generated: ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node has the same name as the slot on the secondary node and the failover flag is set to true, if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { *slot->data.synced = true* ... Thanks for your feedback On Wed, Jun 11, 2025 at 6:48 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > On Tue, Jun 10, 2025 at 11:46 PM Fabrice Chapuis wrote: > > I'm working with logical replication in a PostgreSQL 17 setup, and I'm > > exploring the new option to make replication slots failover safe in a > highly > > available environment using physical standby nodes managed by Patroni. > > > > After a switchover, I encounter an error message in the PostgreSQL logs > and observe unexpected behavior. > > Here are the different steps I followed: > > > > 1) Setting up a new subscription > > > > Logical replication is established between two databases on the same > PostgreSQL instance. > > > > A logical replication slot is created on the source database: > > > > SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false, > false, true); > > > > A subscription is then configured on the target database: > > > > CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost > port=5432 user=user_test' > > PUBLICATION pub_test WITH (create_slot=false, copy_data=false, > failover=true); > > > > The logical replication slot is active and in failover mode. > > > > 2) Starting the physical standby > > > > A logical replication slot is successfully created on the standby > > > > 3) Cluster switchover > > > > The switchover is initiated using the Patroni command: > > > > patronictl switchover > > > > The operation completes successfully, and roles are reversed in the > cluster. > > ... > > 4) Issue encountered > > After the switchover, an error appears in the PostgreSQL logs: > > > > 2025-06-10 16:40:58.996 CEST [739829]: [1-1] > user=,db=,client=,application= LOG: slot sync worker started > > 2025-06-10 16:40:59.011 CEST [739829]: [2-1] > user=,db=,client=,application= ERROR: exiting from slot synchronization > because same name slot "sub_test" already exists on the standby > > ... > > 5) Dropping the slot > > > > If the slot on the standby is deleted, it is then recreated with synced > = true, and at that point, it successfully resynchronizes with the primary > slot. Everything works correctly. > > > > Question: > > Why does the synced flag fail to change to true, even though > sync_replication_slots is enabled (on)? > > Thank you for reporting this. This behavior is expected because overwriting > existing slots on standbys is not permitted for now. Doing so poses a risk > of > rendering slots created by users for other purposes unusable. > > However, if needed, we could permit overwriting when the existing slot has > failover=true, given that enabling failover for slots on standbys is > currently > disallowed, but this assumption might change in the future if we support > enabling failover to allow slot syncing to cascading standbys. > Alternatively, > we could introduce options, such as a GUC, to control whether to overwrite > existing slots though not sure if it's worth it. > > From a database user's perspective, it's necessary to clean up any leftover > slots on a new standby following a switchover, regardless of whether the > failover slot feature is supported. Because those leftover slots could > lead to > excessive WAL accumulation. > > > Best Regards, > Hou zj >