On Thu, Apr 8, 2021 at 7:39 AM Andres Freund <and...@anarazel.de> wrote: > > On 2021-04-07 17:10:37 -0700, Andres Freund wrote: > > I think this can be solved in two different ways: > > > > 1) Hold ReplicationSlotAllocationLock with LW_SHARED across most of > > InvalidateObsoleteReplicationSlots(). That way nobody could re-create a > > new > > slot in the to-be-obsoleted-slot's place. > > > > 2) Atomically check whether the slot needs to be invalidated and try to > > acquire if needed. Don't release ReplicationSlotControlLock between those > > two steps. Signal the owner to release the slot iff we couldn't acquire > > the > > slot. In the latter case wait and then recheck if the slot still needs to > > be dropped. > > > > To me 2) seems better, because we then can also be sure that the slot still > > needs to be obsoleted, rather than potentially doing so unnecessarily. > >
+1. > > > > It looks to me like several of the problems here stem from trying to reuse > > code from ReplicationSlotAcquireInternal() (which before this was just named > > ReplicationSlotAcquire()). I don't think that makes sense, because cases > > like > > this want to check if a condition is true, and acquire it only if so. > > > > IOW, I think this basically needs to look like > > ReplicationSlotsDropDBSlots(), > > except that a different condition is checked, and the if (active_pid) case > > needs to prepare a condition variable, signal the owner and then wait on the > > condition variable, to restart after. > > I'm also confused by the use of ConditionVariableTimedSleep(timeout = > 10). Why do we need a timed sleep here in the first place? And why with > such a short sleep? > > I also noticed that the code is careful to use CHECK_FOR_INTERRUPTS(); - > but is aware it's running in checkpointer. I don't think CFI does much > there? If we are worried about needing to check for interrupts, more > work is needed. > > > Sketch for a fix attached. I did leave the odd > ConditionVariableTimedSleep(10ms) in, because I wasn't sure why it's > there... > I haven't tested the patch but I couldn't spot any problems while reading it. A minor point, don't we need to use ConditionVariableCancelSleep() at some point after doing ConditionVariableTimedSleep? -- With Regards, Amit Kapila.