> On 2020-11-17 12:55:01 -0300, Alvaro Herrera wrote: > > ... ah, but I realize now that this means that we can use shared lock > > here, not exclusive, which is already an enormous improvement. That's > > because ->pgxactoff can only be changed with exclusive lock held; so as > > long as we hold shared, the array item cannot move. > > Uh, wait a second. The acquisition of this lock hasn't been affected by > the snapshot scalability changes, and therefore are unrelated to > ->pgxactoff changing or not.
I'm writing a response trying to thoroughly analyze this, but I also wanted to report that ProcSleep is being a bit generous with what it calls "as quickly as possible" here: LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); /* * Only do it if the worker is not working to protect against Xid * wraparound. */ statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff]; if ((statusFlags & PROC_IS_AUTOVACUUM) && !(statusFlags & PROC_VACUUM_FOR_WRAPAROUND)) { int pid = autovac->pid; StringInfoData locktagbuf; StringInfoData logbuf; /* errdetail for server log */ initStringInfo(&locktagbuf); initStringInfo(&logbuf); DescribeLockTag(&locktagbuf, &lock->tag); appendStringInfo(&logbuf, _("Process %d waits for %s on %s."), MyProcPid, GetLockmodeName(lock->tag.locktag_lockmethodid, lockmode), locktagbuf.data); /* release lock as quickly as possible */ LWLockRelease(ProcArrayLock); The amount of stuff that this is doing with ProcArrayLock held exclusively seems a bit excessive; it sounds like we could copy the values we need first, release the lock, and *then* do all that memory allocation and string printing -- it's a lock of code for such a contended lock. Anytime a process sees itself as blocked by autovacuum and wants to signal it, there's a ProcArrayLock hiccup (I didn't actually measure it, but it's at least five function calls). We could make this more concurrent by copying lock->tag to a local variable, releasing the lock, then doing all the string formatting and printing. See attached quickly.patch. Now, when this code was written (d7318d43d, 2012), this was a LOG message; it was demoted to DEBUG1 later (d8f15c95bec, 2015). I think it would be fair to ... remove the message? Or go back to Simon's original formulation from commit acac68b2bca, which had this message as DEBUG2 without any string formatting. Thoughts?
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c index d1738c65f5..caefff41e2 100644 --- a/src/backend/storage/lmgr/proc.c +++ b/src/backend/storage/lmgr/proc.c @@ -1325,20 +1325,22 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) int pid = autovac->pid; StringInfoData locktagbuf; StringInfoData logbuf; /* errdetail for server log */ + LOCKTAG locktag_copy; + + /* release lock as quickly as possible */ + locktag_copy = lock->tag; + LWLockRelease(ProcArrayLock); initStringInfo(&locktagbuf); initStringInfo(&logbuf); - DescribeLockTag(&locktagbuf, &lock->tag); + DescribeLockTag(&locktagbuf, &locktag_copy); appendStringInfo(&logbuf, _("Process %d waits for %s on %s."), MyProcPid, - GetLockmodeName(lock->tag.locktag_lockmethodid, + GetLockmodeName(locktag_copy.locktag_lockmethodid, lockmode), locktagbuf.data); - /* release lock as quickly as possible */ - LWLockRelease(ProcArrayLock); - /* send the autovacuum worker Back to Old Kent Road */ ereport(DEBUG1, (errmsg("sending cancel to blocking autovacuum PID %d",