Amit Kapila <amit.kapil...@gmail.com> writes: > On Thu, Aug 20, 2020 at 10:28 PM Tom Lane <t...@sss.pgh.pa.us> wrote: >> 2. On the other hand, the code is *releasing* the >> ReplicationSlotControlLock before it calls >> ProcArraySetReplicationSlotXmin, and that seems like a flat out >> concurrency bug.
> It is not clear to me how those values can go backward. After releasing ReplicationSlotControlLock, that code is holding no lock at all (in the already_locked=false case I'm concerned about). Thus the scenario to consider is: 1. Walsender A runs ReplicationSlotsComputeRequiredXmin, computes some perfectly-valid xmins, releases ReplicationSlotControlLock, amd then gets swapped out to Timbuktu. 2. Time passes and the "true values" of those xmins advance thanks to other walsender activity. 3. Walsender B runs ReplicationSlotsComputeRequiredXmin, computes some perfectly-valid xmins, and successfully stores them in the procarray. 4. Walsender A returns from never-never land, and stores its now quite stale results in the procarray, causing the globally visible xmins to go backwards from where they were after step 3. I see no mechanism in the code that prevents this scenario. On reflection I'm not even very sure that the code change I'm suggesting would prevent it. It'd prevent walsenders from entering or exiting until we've updated the procarray, but there's nothing to stop the furthest-back walsender from advancing its values. There may be some argument why this can't lead to a problem, but I don't see any comments making such an argument, either. regards, tom lane