On Tue, Jun 10, 2025 at 11:46 PM Fabrice Chapuis wrote:
> I'm working with logical replication in a PostgreSQL 17 setup, and I'm
> exploring the new option to make replication slots failover safe in a highly
> available environment using physical standby nodes managed by Patroni.
>  
> After a switchover, I encounter an error message in the PostgreSQL logs and 
> observe unexpected behavior.
> Here are the different steps I followed:
>  
> 1) Setting up a new subscription 
>  
> Logical replication is established between two databases on the same 
> PostgreSQL instance.
>  
> A logical replication slot is created on the source database:
>  
> SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false, 
> false, true);
>  
> A subscription is then configured on the target database:
>  
> CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost port=5432 
> user=user_test' 
> PUBLICATION pub_test WITH (create_slot=false, copy_data=false, failover=true);
>  
> The logical replication slot is active and in failover mode.
>  
> 2) Starting the physical standby
>  
> A logical replication slot is successfully created on the standby
>  
> 3) Cluster switchover
>  
> The switchover is initiated using the Patroni command:
>  
> patronictl switchover
>  
> The operation completes successfully, and roles are reversed in the cluster.
> ...
> 4) Issue encountered
> After the switchover, an error appears in the PostgreSQL logs:
>  
> 2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application= 
> LOG: slot sync worker started
> 2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application= 
> ERROR: exiting from slot synchronization because same name slot "sub_test" 
> already exists on the standby
> ...
> 5) Dropping the slot
>  
> If the slot on the standby is deleted, it is then recreated with synced = 
> true, and at that point, it successfully resynchronizes with the primary 
> slot. Everything works correctly.
>  
> Question:
> Why does the synced flag fail to change to true, even though 
> sync_replication_slots is enabled (on)?

Thank you for reporting this. This behavior is expected because overwriting
existing slots on standbys is not permitted for now. Doing so poses a risk of
rendering slots created by users for other purposes unusable.
 
However, if needed, we could permit overwriting when the existing slot has
failover=true, given that enabling failover for slots on standbys is currently
disallowed, but this assumption might change in the future if we support
enabling failover to allow slot syncing to cascading standbys. Alternatively,
we could introduce options, such as a GUC, to control whether to overwrite
existing slots though not sure if it's worth it.
 
From a database user's perspective, it's necessary to clean up any leftover
slots on a new standby following a switchover, regardless of whether the
failover slot feature is supported. Because those leftover slots could lead to
excessive WAL accumulation.
 

Best Regards,
Hou zj

Reply via email to