On Thu, Jan 18, 2024 at 10:31 AM Peter Smith <smithpb2...@gmail.com> wrote: > > I have one question about the new code in v63-0002. > > ====== > src/backend/replication/logical/slotsync.c > > 1. ReplSlotSyncWorkerMain > > + Assert(SlotSyncWorker->pid == InvalidPid); > + > + /* > + * Startup process signaled the slot sync worker to stop, so if meanwhile > + * postmaster ended up starting the worker again, exit. > + */ > + if (SlotSyncWorker->stopSignaled) > + { > + SpinLockRelease(&SlotSyncWorker->mutex); > + proc_exit(0); > + } > > Can we be sure a worker crash can't occur (in ShutDownSlotSync?) in > such a way that SlotSyncWorker->stopSignaled was already assigned > true, but SlotSyncWorker->pid was not yet reset to InvalidPid; > > e.g. Is the Assert above still OK?
We are good with the Assert here. I tried below cases: 1) When slotsync worker is say killed using 'kill', it is considered as SIGTERM; slot sync worker invokes 'slotsync_worker_onexit()' before going down and thus sets SlotSyncWorker->pid = InvalidPid. This means when it is restarted (considering we have put the breakpoints in such a way that postmaster had already reached do_start_bgworker() before promotion finished), it is able to see stopSignaled set but pid is InvalidPid and thus we are good. 2) Another case is when we kill slot sync worker using 'kill -9' (or say we make it crash), in such a case, postmaster signals each sibling process to quit (including startup process) and cleans up the shared memory used by each (including SlotSyncWorker). In such a case promotion fails. And if slot sync worker is started again, it will find pid as InvalidPid. So we are good. thanks Shveta