On Fri, Dec 22, 2023 at 7:59 PM Bertrand Drouvot <bertranddrouvot...@gmail.com> wrote: > > Hi, > > On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote: > > PFA v53. Changes are: > > Thanks! > > > patch002: > > 2) Addressed comments in [2] for v52-002. > > 3) Fixed CFBot failure. The failure was caused by an assert in > > wait_for_primary_slot_catchup() for null confirmed_lsn received. In > > wait_for_primary_slot_catchup(), we had an assumption that if > > restart_lsn is valid and 'conflicting' is also false, then we must > > have non-null confirmed_lsn. But this is not true. It is possible to > > get null values for confirmed_lsn and catalog_xmin if on the primary > > server the slot is just created with a valid restart_lsn and slot-sync > > worker has fetched the slot before the primary server could set valid > > confirmed_lsn and catalog_xmin. In > > pg_create_logical_replication_slot(), there is a small window between > > CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets > > restart_lsn and DecodingContextFindStartpoint() which sets > > confirmed_lsn. If the slot-sync worker fetches the slot in this > > window, confirmed_lsn received will be NULL. Corrected the code to > > remove assert and added one additional condition that confirmed_lsn > > should be valid before moving the slot to 'r'. > > > > Looking at v53-0002 commit message: > > It states: > > " > If a logical slot on the primary is valid but is invalidated on the standby, > then that slot is dropped and recreated on the standby in next sync-cycle. > " > > and one of the reasons mentioned is: > > " > - The primary changes wal_level to a level lower than logical. > " > > I think that as long at there is still logical replication slot on the primary > that should not be possible. The primary should fail to start with messages > like: > > " > 2023-12-22 14:06:09.281 UTC [31824] FATAL: logical replication slot > "logical_slot" exists, but wal_level < logical > "
Yes, right. It fails in such a case. > > Now, if: > > - The standby is shutdown > - All the logical replication slots are removed on the primary > - wal_level is set to < logical on the primary and it is restarted > > Then when the standby starts, the "synced" slots will be invalidated and later > removed but not re-created on the next sync-cycle (because they don't exist > anymore on the primary). > > Worth to reword a bit that part? yes, will change these details. Thanks! > Regards, > > -- > Bertrand Drouvot > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com