I'm working with logical replication in a PostgreSQL 17 setup, and I'm exploring the new option to make replication slots failover safe in a highly available environment using physical standby nodes managed by Patroni.
After a switchover, I encounter an error message in the PostgreSQL logs and observe unexpected behavior. Here are the different steps I followed: 1) Setting up a new subscription Logical replication is established between two databases on the same PostgreSQL instance. A logical replication slot is created on the source database: SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false, false, true); A subscription is then configured on the target database: CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost port=5432 user=user_test' PUBLICATION pub_test WITH (create_slot=false, copy_data=false, failover=true); The logical replication slot is active and in failover mode. \dRs+ List of subscriptions +-[ RECORD 1 ]-------+----------------------------------------------------------------------------+ | Name | sub_test | | Owner | postgres | | Enabled | t | | Publication | {pub_test} | | Binary | f | | Streaming | off | | Two-phase commit | d | | Disable on error | f | | Origin | any | | Password required | t | | Run as owner? | f | | Failover | t | | Synchronous commit | off | | Conninfo | dbname=test host=localhost port=5432 user=user_test | | Skip LSN | 0/0 | +--------------------+----------------------------------------------------------------------------+ select * from pg_replication_slots where slot_type = 'logical'; +-[ RECORD 1 ]--------+----------------+ | slot_name | sub_test | | plugin | pgoutput | | slot_type | logical | | datoid | 58458 | | database | test | | temporary | f | | active | t | | active_pid | 739313 | | xmin | | | catalog_xmin | 1976743 | | restart_lsn | 8/5F000028 | | confirmed_flush_lsn | 8/5F000060 | | wal_status | reserved | | safe_wal_size | | | two_phase | f | | inactive_since | | | conflicting | f | | invalidation_reason | | | failover | t | | synced | f | +---------------------+----------------+ 2) Starting the physical standby A logical replication slot is successfully created on the standby select * from pg_replication_slots where slot_type = 'logical'; +-[ RECORD 1 ]--------+-------------------------------+ | slot_name | sub_test | | plugin | pgoutput | | slot_type | logical | | datoid | 58458 | | database | test | | temporary | f | | active | f | | active_pid | | | xmin | | | catalog_xmin | 1976743 | | restart_lsn | 8/5F000028 | | confirmed_flush_lsn | 8/5F000060 | | wal_status | reserved | | safe_wal_size | | | two_phase | f | | inactive_since | 2025-06-10 16:30:38.633723+02 | | conflicting | f | | invalidation_reason | | | failover | t | | synced | t | +---------------------+-------------------------------+ 3) Cluster switchover The switchover is initiated using the Patroni command: patronictl switchover The operation completes successfully, and roles are reversed in the cluster. 4) Issue encountered After the switchover, an error appears in the PostgreSQL logs: 2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application= LOG: slot sync worker started 2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application= ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby the slot on the new standby in not in sync mode. select * from pg_replication_slots where slot_type = 'logical'; +-[ RECORD 1 ]--------+-------------------------------+ | slot_name | sub_test | | plugin | pgoutput | | slot_type | logical | | datoid | 58458 | | database | test | | temporary | f | | active | f | | active_pid | | | xmin | | | catalog_xmin | 1976743 | | restart_lsn | 8/5F000080 | | confirmed_flush_lsn | 8/5F000130 | | wal_status | reserved | | safe_wal_size | | | two_phase | f | | inactive_since | 2025-06-10 16:33:49.573016+02 | | conflicting | f | | invalidation_reason | | | failover | t | | synced | f | +---------------------+-------------------------------+ In the source code (slotsync.c), the check for the synced flag triggers an error: /* Search for the named slot */ if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { bool synced; SpinLockAcquire(&slot->mutex); synced = slot->data.synced; SpinLockRelease(&slot->mutex); /* A user-created slot with the same name exists → raise ERROR */ if (!synced) ereport(ERROR, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("exiting from slot synchronization because same" " name slot \"%s\" already exists on the standby", remote_slot->name)); } 5) Dropping the slot If the slot on the standby is deleted, it is then recreated with synced = true, and at that point, it successfully resynchronizes with the primary slot. Everything works correctly. Question: Why does the synced flag fail to change to true, even though sync_replication_slots is enabled (on)? Thanks for helping Fabrice