ReplicationSlotsComputeRequiredXmin seems pretty questionable

Tom Lane Thu, 20 Aug 2020 09:59:21 -0700

While trying to make sense of Adam Sjøgren's problem [1], I found
myself staring at ReplicationSlotsComputeRequiredXmin() in slot.c.
It seems to me that that is very shaky code, on two different
grounds:


1. Sometimes it's called with ProcArrayLock already held exclusively.
This means that any delay in acquiring the ReplicationSlotControlLock
translates directly into a hold on ProcArrayLock; in other words,
every acquisition of the ReplicationSlotControlLock is just as bad
for concurrency as an acquisition of ProcArrayLock.  While I didn't
see any places that were doing really obviously slow things while
holding ReplicationSlotControlLock, this is disturbing.  Do we really
need it to be like that?

2. On the other hand, the code is *releasing* the
ReplicationSlotControlLock before it calls
ProcArraySetReplicationSlotXmin, and that seems like a flat out
concurrency bug.  How can we be sure that the values we're storing
into the shared xmin fields aren't stale by the time we acquire
the ProcArrayLock (in the case where we didn't hold it already)?
I'm concerned that in the worst case this code could make the
shared xmin fields go backwards.

Both of these issues could be solved, I think, if we got rid of
the provision for calling with ProcArrayLock already held and
moved the ProcArraySetReplicationSlotXmin call inside the hold
of ReplicationSlotControlLock.  But maybe I'm missing something
about why that would be worse.

                        regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/87364kdsim.fsf%40tullinup.koldfront.dk

ReplicationSlotsComputeRequiredXmin seems pretty questionable

Reply via email to