Thanks for your reply.
The problem I see is that after creating a new subscription, we have:

1) if a failover occurs, on the new primary node, the failover and sync
flags are both set to true, so there's no problem.

2) when the old node returns as a secondary in the cluster, the failover
flag is set to true and the sync flag is set to false then
the error message is generated:  ERROR: exiting from slot synchronization
because same name slot "sub_test" already exists on the standby

Why not change the value of the synced flag when the standby is joining the
cluster ? If the slot on the primary node has the same name as the slot on
the secondary node and the failover flag is set to true,

if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
*slot->data.synced = true*
...
Thanks for your feedback

On Wed, Jun 11, 2025 at 6:48 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com>
wrote:

> On Tue, Jun 10, 2025 at 11:46 PM Fabrice Chapuis wrote:
> > I'm working with logical replication in a PostgreSQL 17 setup, and I'm
> > exploring the new option to make replication slots failover safe in a
> highly
> > available environment using physical standby nodes managed by Patroni.
> >
> > After a switchover, I encounter an error message in the PostgreSQL logs
> and observe unexpected behavior.
> > Here are the different steps I followed:
> >
> > 1) Setting up a new subscription
> >
> > Logical replication is established between two databases on the same
> PostgreSQL instance.
> >
> > A logical replication slot is created on the source database:
> >
> > SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false,
> false, true);
> >
> > A subscription is then configured on the target database:
> >
> > CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost
> port=5432 user=user_test'
> > PUBLICATION pub_test WITH (create_slot=false, copy_data=false,
> failover=true);
> >
> > The logical replication slot is active and in failover mode.
> >
> > 2) Starting the physical standby
> >
> > A logical replication slot is successfully created on the standby
> >
> > 3) Cluster switchover
> >
> > The switchover is initiated using the Patroni command:
> >
> > patronictl switchover
> >
> > The operation completes successfully, and roles are reversed in the
> cluster.
> > ...
> > 4) Issue encountered
> > After the switchover, an error appears in the PostgreSQL logs:
> >
> > 2025-06-10 16:40:58.996 CEST [739829]: [1-1]
> user=,db=,client=,application= LOG: slot sync worker started
> > 2025-06-10 16:40:59.011 CEST [739829]: [2-1]
> user=,db=,client=,application= ERROR: exiting from slot synchronization
> because same name slot "sub_test" already exists on the standby
> > ...
> > 5) Dropping the slot
> >
> > If the slot on the standby is deleted, it is then recreated with synced
> = true, and at that point, it successfully resynchronizes with the primary
> slot. Everything works correctly.
> >
> > Question:
> > Why does the synced flag fail to change to true, even though
> sync_replication_slots is enabled (on)?
>
> Thank you for reporting this. This behavior is expected because overwriting
> existing slots on standbys is not permitted for now. Doing so poses a risk
> of
> rendering slots created by users for other purposes unusable.
>
> However, if needed, we could permit overwriting when the existing slot has
> failover=true, given that enabling failover for slots on standbys is
> currently
> disallowed, but this assumption might change in the future if we support
> enabling failover to allow slot syncing to cascading standbys.
> Alternatively,
> we could introduce options, such as a GUC, to control whether to overwrite
> existing slots though not sure if it's worth it.
>
> From a database user's perspective, it's necessary to clean up any leftover
> slots on a new standby following a switchover, regardless of whether the
> failover slot feature is supported. Because those leftover slots could
> lead to
> excessive WAL accumulation.
>
>
> Best Regards,
> Hou zj
>

Reply via email to