Hi, On Fri, Jan 12, 2024 at 08:42:39AM +0530, Amit Kapila wrote: > On Thu, Jan 11, 2024 at 9:11 PM Bertrand Drouvot > <bertranddrouvot...@gmail.com> wrote: > > > > On Thu, Jan 11, 2024 at 04:22:56PM +0530, Amit Kapila wrote: > > > > > > > > To close the above race, I could think of the following ways: > > > > 1. Drop and re-create the slot. > > > > 2. Emit LOG/WARNING in this case and once remote_slot's LSN moves > > > > ahead of local_slot's LSN then we can update it; but as mentioned in > > > > your previous comment, we need to update all other fields as well. If > > > > we follow this then we probably need to have a check for catalog_xmin > > > > as well. > > > > IIUC, this would be a sync slot (so not usable until promotion) that could > > not be used anyway (invalidated), so I'll vote for drop / re-create then. > > > > No, it can happen for non-sync slots as well.
Yeah, I meant that we could decide to drop/re-create only for sync slots. > > > > > Now, related to this the other case which needs some handling is what > > > > if the remote_slot's restart_lsn is greater than local_slot's > > > > restart_lsn but it is a re-created slot with the same name. In that > > > > case, I think the other properties like 'two_phase', 'plugin' could be > > > > different. So, is simply copying those sufficient or do we need to do > > > > something else as well? > > > > > > > > > > > I'm not sure to follow here. If the remote slot is re-created then it would > > be also dropped / re-created locally, or am I missing something? > > > > As our slot-syncing mechanism is asynchronous (from time to time we > check the slot information on primary), isn't it possible that the > same name slot is dropped and recreated between slot-sync worker's > checks? > Yeah, I should have thought harder ;-) So for this case, let's imagine that If we had an easy way to detect that a remote slot has been drop/re-created then I think we would also drop and re-create it on the standby too. If so, I think we should then update all the fields (that we're currently updating in the "create locally" case) when we detect that (at least) one of the following differs: - dboid - plugin - two_phase Maybe the "best" approach would be to have a way to detect that a slot has been re-created on the primary (but that would mean rely on more than the slot name to "identify" a slot and probably add a new member to the struct to do so). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com