failover logical replication slots

Fabrice Chapuis Tue, 10 Jun 2025 08:47:02 -0700

I'm working with logical replication in a PostgreSQL 17 setup, and I'm
exploring the new option to make replication slots failover safe in a
highly available environment
using physical standby nodes managed by Patroni.


After a switchover, I encounter an error message in the PostgreSQL logs and
observe unexpected behavior.
Here are the different steps I followed:

1) Setting up a new subscription

Logical replication is established between two databases on the same
PostgreSQL instance.

A logical replication slot is created on the source database:

SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false,
false, true);

A subscription is then configured on the target database:

CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost
port=5432 user=user_test'
PUBLICATION pub_test WITH (create_slot=false, copy_data=false,
failover=true);

The logical replication slot is active and in failover mode.

\dRs+
List of subscriptions
+-[ RECORD 1
]-------+----------------------------------------------------------------------------+
| Name               | sub_test
                       |
| Owner              | postgres
                       |
| Enabled            | t
                      |
| Publication        | {pub_test}
                       |
| Binary             | f
                      |
| Streaming          | off
                      |
| Two-phase commit   | d
                      |
| Disable on error   | f
                      |
| Origin             | any
                      |
| Password required  | t
                      |
| Run as owner?      | f
                      |
| Failover           | t
                      |
| Synchronous commit | off
                      |
| Conninfo           | dbname=test host=localhost port=5432 user=user_test
                      |
| Skip LSN           | 0/0
                      |
+--------------------+----------------------------------------------------------------------------+

select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+----------------+
| slot_name           | sub_test       |
| plugin              | pgoutput       |
| slot_type           | logical        |
| datoid              | 58458          |
| database            | test           |
| temporary           | f              |
| active              | t              |
| active_pid          | 739313         |
| xmin                |                |
| catalog_xmin        | 1976743        |
| restart_lsn         | 8/5F000028     |
| confirmed_flush_lsn | 8/5F000060     |
| wal_status          | reserved       |
| safe_wal_size       |                |
| two_phase           | f              |
| inactive_since      |                |
| conflicting         | f              |
| invalidation_reason |                |
| failover            | t              |
| synced              | f              |
+---------------------+----------------+

2) Starting the physical standby

A logical replication slot is successfully created on the standby

select * from pg_replication_slots where slot_type = 'logical';
+-[ RECORD 1 ]--------+-------------------------------+
| slot_name           | sub_test               |
| plugin              | pgoutput                      |
| slot_type           | logical                       |
| datoid              | 58458                         |
| database            | test                      |
| temporary           | f                             |
| active              | f                             |
| active_pid          |                               |
| xmin                |                               |
| catalog_xmin        | 1976743                       |
| restart_lsn         | 8/5F000028                    |
| confirmed_flush_lsn | 8/5F000060                    |
| wal_status          | reserved                      |
| safe_wal_size       |                               |
| two_phase           | f                             |
| inactive_since      | 2025-06-10 16:30:38.633723+02 |
| conflicting         | f                             |
| invalidation_reason |                               |
| failover            | t                             |
| synced              | t                             |
+---------------------+-------------------------------+

3) Cluster switchover

The switchover is initiated using the Patroni command:

patronictl switchover

The operation completes successfully, and roles are reversed in the cluster.

4) Issue encountered
After the switchover, an error appears in the PostgreSQL logs:

2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application=
LOG: slot sync worker started
2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application=
ERROR: exiting from slot synchronization because same name slot "sub_test"
already exists on the standby

the slot on the new standby in not in sync mode.

select * from pg_replication_slots where slot_type = 'logical';

+-[ RECORD 1 ]--------+-------------------------------+
| slot_name           | sub_test                |
| plugin              | pgoutput                      |
| slot_type           | logical                       |
| datoid              | 58458                         |
| database            | test                      |
| temporary           | f                             |
| active              | f                             |
| active_pid          |                               |
| xmin                |                               |
| catalog_xmin        | 1976743                       |
| restart_lsn         | 8/5F000080                    |
| confirmed_flush_lsn | 8/5F000130                    |
| wal_status          | reserved                      |
| safe_wal_size       |                               |
| two_phase           | f                             |
| inactive_since      | 2025-06-10 16:33:49.573016+02 |
| conflicting         | f                             |
| invalidation_reason |                               |
| failover            | t                             |
| synced              | f                             |
+---------------------+-------------------------------+

In the source code (slotsync.c), the check for the synced flag triggers an
error:

/* Search for the named slot */
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
    bool synced;

    SpinLockAcquire(&slot->mutex);
    synced = slot->data.synced;
    SpinLockRelease(&slot->mutex);

    /* A user-created slot with the same name exists → raise ERROR */
    if (!synced)
        ereport(ERROR,
                errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                errmsg("exiting from slot synchronization because same"
                       " name slot \"%s\" already exists on the standby",
                       remote_slot->name));
}

5) Dropping the slot

If the slot on the standby is deleted, it is then recreated with synced =
true, and at that point, it successfully resynchronizes with the primary
slot. Everything works correctly.

Question:
Why does the synced flag fail to change to true, even though
sync_replication_slots is enabled (on)?

Thanks for helping

Fabrice

failover logical replication slots

Reply via email to