On Tue, Jan 9, 2024 at 6:39 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > +static bool > +synchronize_one_slot(WalReceiverConn *wrconn, RemoteSlot *remote_slot) > { > ... > + /* Slot ready for sync, so sync it. */ > + else > + { > + /* > + * Sanity check: With hot_standby_feedback enabled and > + * invalidations handled appropriately as above, this should never > + * happen. > + */ > + if (remote_slot->restart_lsn < slot->data.restart_lsn) > + elog(ERROR, > + "cannot synchronize local slot \"%s\" LSN(%X/%X)" > + " to remote slot's LSN(%X/%X) as synchronization" > + " would move it backwards", remote_slot->name, > + LSN_FORMAT_ARGS(slot->data.restart_lsn), > + LSN_FORMAT_ARGS(remote_slot->restart_lsn)); > ... > } > > I was thinking about the above code in the patch and as far as I can > think this can only occur if the same name slot is re-created with > prior restart_lsn after the existing slot is dropped. Normally, the > newly created slot (with the same name) will have higher restart_lsn > but one can mimic it by copying some older slot by using > pg_copy_logical_replication_slot(). > > I don't think as mentioned in comments even if hot_standby_feedback is > temporarily set to off, the above shouldn't happen. It can only lead > to invalidated slots on standby. > > To close the above race, I could think of the following ways: > 1. Drop and re-create the slot. > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves > ahead of local_slot's LSN then we can update it; but as mentioned in > your previous comment, we need to update all other fields as well. If > we follow this then we probably need to have a check for catalog_xmin > as well. >
The second point as mentioned is slightly misleading, so let me try to rephrase it once again: Emit LOG/WARNING in this case and once remote_slot's LSN moves ahead of local_slot's LSN then we can update it; additionally, we need to update all other fields like two_phase as well. If we follow this then we probably need to have a check for catalog_xmin as well along remote_slot's restart_lsn. > Now, related to this the other case which needs some handling is what > if the remote_slot's restart_lsn is greater than local_slot's > restart_lsn but it is a re-created slot with the same name. In that > case, I think the other properties like 'two_phase', 'plugin' could be > different. So, is simply copying those sufficient or do we need to do > something else as well? > Bertrand, Dilip, Sawada-San, and others, please share your opinion on this problem as I think it is important to handle this race condition. -- With Regards, Amit Kapila.