On Tue, Dec 5, 2023 at 10:38 AM shveta malik <shveta.ma...@gmail.com> wrote: > > On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand > <bertranddrouvot...@gmail.com> wrote: > > > > > > > >> ~~~ > > >> 4. primary_slot_name GUC value test: > > >> > > >> When standby is started with a non-existing primary_slot_name, the > > >> wal-receiver gives an error but the slot-sync worker does not raise > > >> any error/warning. It is no-op though as it has a check 'if > > >> (XLogRecPtrIsInvalid(WalRcv->latestWalEnd)) do nothing'. Is this > > >> okay or shall the slot-sync worker too raise an error and exit? > > >> > > >> In another case, when standby is started with valid primary_slot_name, > > >> but it is changed to some invalid value in runtime, then walreceiver > > >> starts giving error but the slot-sync worker keeps on running. In this > > >> case, unlike the previous case, it even did not go to no-op mode (as > > >> it sees valid WalRcv->latestWalEnd from the earlier run) and keep > > >> pinging primary repeatedly for slots. Shall here it should error out > > >> or at least be no-op until we give a valid primary_slot_name? > > >> > > > > > > > Nice catch, thanks! > > > > > I reviewed it. There is no way to test the existence/validity of > > > 'primary_slot_name' on standby without making a connection to primary. > > > If primary_slot_name is invalid from the start, slot-sync worker will > > > be no-op (as you tested) as WalRecv->latestWalENd will be invalid, and > > > if 'primary_slot_name' is changed to invalid on runtime, slot-sync > > > worker will still keep on pinging primary. But that should be okay (in > > > fact needed) as it needs to sync at-least the previous slot's > > > positions (in case it is delayed in doing so for some reason earlier). > > > And once the slots are up-to-date on standby, even if worker pings > > > primary, it will not see any change in slots lsns and thus go for > > > longer nap. I think, it is not worth the effort to introduce the > > > complexity of checking validity of 'primary_slot_name' on primary from > > > standby for this rare scenario. > > > > > > > Maybe another option could be to have the walreceiver a way to let the slot > > sync > > worker knows that it (the walreceiver) was not able to start due to non > > existing > > replication slot on the primary? (that way we'd avoid the slot sync worker > > having > > to talk to the primary). > > Few points: > 1) I think if we do it, we should do it in generic way i.e. slotsync > worker should go to no-op if walreceiver is not able to start due to > any reason and not only due to invalid primary_slot_name. > 2) Secondly, slotsync worker needs to make sure it has synced the > slots so far i.e. worker should not go to no-op immediately on seeing > missing WalRcv process if there are pending slots to be synced. >
Won't it be better to just ping and check the validity of 'primary_slot_name' at the start of slot-sync and if it is changed anytime? I think it would be better to avoid adding dependency on walreciever state as that sounds like needless complexity. -- With Regards, Amit Kapila.