On 2020-Nov-18, Andres Freund wrote: > In 13 this is: > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > MyPgXact->vacuumFlags |= PROC_IN_VACUUM; > if (params->is_wraparound) > MyPgXact->vacuumFlags |= PROC_VACUUM_FOR_WRAPAROUND; > LWLockRelease(ProcArrayLock); > > Lowering this to a shared lock doesn't seem right, at least without a > detailed comment explaining why it's safe. Because GetSnapshotData() etc > look at all procs with just an LW_SHARED ProcArrayLock, changing > vacuumFlags without a lock means that two concurrent horizon > computations could come to a different result. > > I'm not saying it's definitely wrong to relax things here, but I'm not > sure we've evaluated it sufficiently.
True. Let's evaluate it. I changed three spots where statusFlags bits are written: a) StartupDecodingContext: sets PROC_IN_LOGICAL_DECODING b) ReplicationSlotRelease: removes PROC_IN_LOGICAL_DECODING c) vacuum_rel: sets PROC_IN_VACUUM and potentially PROC_VACUUM_FOR_WRAPAROUND Who reads these flags? PROC_IN_LOGICAL_DECODING is read by: * ComputeXidHorizons * GetSnapshotData PROC_IN_VACUUM is read by: * GetCurrentVirtualXIDs * ComputeXidHorizons * GetSnapshotData * ProcArrayInstallImportedXmin PROC_VACUUM_FOR_WRAPAROUND is read by: * ProcSleep ProcSleep: no bug here. The flags are checked to see if we should kill() the process (used when autovac blocks some other process). The lock here appears to be inconsequential, since we release it before we do kill(); so strictly speaking, there is still a race condition where the autovac process could reset the flag after we read it and before we get a chance to signal it. The lock level autovac uses to set the flag is not relevant either. ProcArrayInstallImportedXmin: This one is just searching for a matching backend; not affected by the flags. PROC_IN_LOGICAL_DECODING: Oddly enough, I think the reset of PROC_IN_LOGICAL_DECODING in ReplicationSlotRelease might be the most problematic one of the lot. That's because a proc's xmin that had been ignored all along by ComputeXidHorizons, will now be included in the computation. Adding asserts that proc->xmin and proc->xid are InvalidXid by the time we reset the flag, I got hits in pg_basebackup, test_decoding and subscription tests. I think it's OK for ComputeXidHorizons (since it just means that a vacuum that reads a later will remove less rows.) But in GetSnapshotData it is just not correct to have the Xmin go backwards. Therefore it seems to me that this code has a bug independently of the lock level used. GetCurrentVirtualXIDs, ComputeXidHorizons, GetSnapshotData: In these cases, what we need is that the code computes some xmin (or equivalent computation) based on a set of transactions that exclude those marked with the flags. The behavior we want is that if some transaction is marked as vacuum, we ignore the Xid/Xmin *if there is one*. In other words, if there's no Xid/Xmin, then the flag is not important. So if we can ensure that the flag is set first, and the Xid/xmin is installed later, that's sufficient correctness and we don't need to hold exclusive lock. But if we can't ensure that, then we must use exclusive lock, because otherwise we risk another process seeing our Xid first and not our flag, which would be bad. In other words, my conclusion is that there definitely is a bug here and I am going to restore the use of exclusive lock for setting the statusFlags. GetSnapshotData has an additional difficulty -- we do the UINT32_ACCESS_ONCE(ProcGlobal->xid[i]) read *before* testing the bit. So it's not valid to set the bit without locking out GSD, regardless of any barriers we had; if we want this to be safe, we'd have to change this so that the flag is read first, and we read the xid only afterwards, with a read barrier. I *think* we could relax the lock if we had a write barrier in between: set the flag first, issue a write barrier, set the Xid. (I have to admit I'm confused about what needs to happen in the read case: read the bit first, potentially skip the PGPROC entry; but can we read the Xid ahead of reading the flag, and if we do read the xid afterwards, do we need a read barrier in between?) Given this uncertainty, I'm not proposing to relax the lock from exclusive to shared. I didn't get around to reading ComputeXidHorizons, but it seems like it'd have the same problem as GSD.