On Tue, Dec 5, 2023 at 7:38 PM Drouvot, Bertrand <bertranddrouvot...@gmail.com> wrote: > > On 12/5/23 12:32 PM, Amit Kapila wrote: > > On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta.ma...@gmail.com> wrote: > >> > >> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand > >> <bertranddrouvot...@gmail.com> wrote: > >>>> > >>> > >>> Maybe another option could be to have the walreceiver a way to let the > >>> slot sync > >>> worker knows that it (the walreceiver) was not able to start due to non > >>> existing > >>> replication slot on the primary? (that way we'd avoid the slot sync > >>> worker having > >>> to talk to the primary). > >> > >> Few points: > >> 1) I think if we do it, we should do it in generic way i.e. slotsync > >> worker should go to no-op if walreceiver is not able to start due to > >> any reason and not only due to invalid primary_slot_name. > >> 2) Secondly, slotsync worker needs to make sure it has synced the > >> slots so far i.e. worker should not go to no-op immediately on seeing > >> missing WalRcv process if there are pending slots to be synced. > >> > > > > Won't it be better to just ping and check the validity of > > 'primary_slot_name' at the start of slot-sync and if it is changed > > anytime? I think it would be better to avoid adding dependency on > > walreciever state as that sounds like needless complexity. > > I think the overall extra complexity is linked to the fact that we first > want to ensure that the slots are in sync before shutting down the > sync slot worker. > > I think than talking to the primary or relying on the walreceiver state > is "just" what would trigger the decision to shutdown the sync slot worker. > > Relying on the walreceiver state looks better to me (as it avoids possibly > useless round trips with the primary). >
But the round trip will only be once in the beginning and if the user changes the GUC primary-slot_name which shouldn't be that often. > Also the walreceiver could be down for multiple reasons, and I think there > is no point of having a sync slot worker running if the slots are in sync and > there is no walreceiver running (even if primary_slot_name is a valid one). > I feel that is indirectly relying on the fact that the primary won't advance logical slots unless physical standby has consumed data. Now, it is possible that slot-sync worker lags behind and still needs to sync more data for slots in which it makes sense for slot-sync worker to be alive. I think we can try to avoid checking walreceiver status till we can get more data to avoid the problem I mentioned but it doesn't sound like a clean way to achieve our purpose. -- With Regards, Amit Kapila.