On 28/09/2023 12:58, Heikki Linnakangas wrote:
On 28/10/2022 06:56, Thomas Munro wrote:
One example is heavyweight lock wakeups.  If you run BEGIN; LOCK TABLE
t; ... and then N other sessions wait in SELECT * FROM t;, and then
you run ... COMMIT;, you'll see the first session wake all the others
while it still holds the partition lock itself.  They'll all wake up
and begin to re-acquire the same partition lock in exclusive mode,
immediately go back to sleep on*that*  wait list, and then wake each
other up one at a time in a chain.  We could avoid the first
double-bounce by not setting the latches until after we've released
the partition lock.  We could avoid the rest of them by not
re-acquiring the partition lock at all, which ... if I'm reading right
... shouldn't actually be necessary in modern PostgreSQL?  Or if there
is another reason to re-acquire then maybe the comment should be
updated.

ISTM that the change to not re-aqcuire the lock in ProcSleep is
independent from the other changes. Let's split that off to a separate
patch.

I spent some time on splitting that off. I had to start from scratch, because commit 2346df6fc373df9c5ab944eebecf7d3036d727de conflicted heavily with your patch.

I split ProcSleep() into two functions: JoinWaitQueue does the first part of ProcSleep(), adding the process to the wait queue and checking for the dontWait and early deadlock cases. What remains in ProcSleep() does just the sleeping part. JoinWaitQueue is called with the partition lock held, and ProcSleep() is called without it. This way, the partition lock is acquired and released in the same function (LockAcquireExtended), avoiding awkward "lock is held on enter, but might be released on exit depending on the outcome" logic.

This is actually a set of 8 patches. The first 7 are independent tiny fixes and refactorings in these functions. See individual commit messages.

I agree it should be safe. Acquiring a lock just to hold off interrupts
is overkill anwyway, HOLD_INTERRUPTS() would be enough.
LockErrorCleanup() uses HOLD_INTERRUPTS() already.

There are no CHECK_FOR_INTERRUPTS() in GrantAwaitedLock(), so cancel/die
interrupts can't happen here. But could we add HOLD_INTERRUPTS(), just
pro forma, to document the assumption? It's a little awkward: you really
should hold interrupts until the caller has done "awaitedLock = NULL;".
So it's not quite enough to add a pair of HOLD_ and RESUME_INTERRUPTS()
at the end of ProcSleep(). You'd need to do the HOLD_INTERRUPTS() in
ProcSleep() and require the caller to do RESUME_INTERRUPTS(). In a
sense, ProcSleep downgrades the lock on the partition to just holding
off interrupts.

I didn't use HOLD/RESUME_INTERRUPTS() after all. Like you did, I'm just relying on the fact that there are no CHECK_FOR_INTERRUPTS() calls in places where they might cause trouble. Those sections are short, so I think it's fine.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 989069eaca788cfd61010e5f88c0feb359b40180 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 01:21:20 +0300
Subject: [PATCH 1/8] Remove LOCK_PRINT() call that could point to garbage

If a deadlock is detected in ProcSleep, it removes the process from
the wait queue and the shared LOCK and PROCLOCK entries. As soon as it
does that, the other process holding the lock might release it, and
remove the LOCK entry altogether. So on return from ProcSleep(), it's
possible that the LOCK entry is no longer valid.
---
 src/backend/storage/lmgr/lock.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 0400a507779..05b3a243b5c 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1861,8 +1861,6 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 			 * now.
 			 */
 			awaitedLock = NULL;
-			LOCK_PRINT("WaitOnLock: aborting on lock",
-					   locallock->lock, locallock->tag.mode);
 			LWLockRelease(LockHashPartitionLock(locallock->hashcode));
 
 			/*
-- 
2.39.2

From 22bc7a3b7826dfbb991b5d63ad1fba52bb297923 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 00:43:13 +0300
Subject: [PATCH 2/8] Fix comment in LockReleaseAll() on when locallock->nLock
 can be zero

We reach this case also e.g. when a deadlock is detected, not only
when we run out of memory.
---
 src/backend/storage/lmgr/lock.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 05b3a243b5c..7eb5c2e5226 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -2208,9 +2208,8 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	while ((locallock = (LOCALLOCK *) hash_seq_search(&status)) != NULL)
 	{
 		/*
-		 * If the LOCALLOCK entry is unused, we must've run out of shared
-		 * memory while trying to set up this lock.  Just forget the local
-		 * entry.
+		 * If the LOCALLOCK entry is unused, something must've gone wrong
+		 * while trying to acquire this lock.  Just forget the local entry.
 		 */
 		if (locallock->nLocks == 0)
 		{
-- 
2.39.2

From ae8d12bab05ede48166d15312b222e046aae14a0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 13:23:59 +0300
Subject: [PATCH 3/8] Set MyProc->heldLocks in ProcSleep

Previously, the ProcSleep() caller was respnsible for setting it, and
we had comments to remind about that. But it seems simpler to make
ProcSleep() itself responsible for it. ProcSleep() already set the
other info about the lock its waiting for (waitLock, waitProcLock and
waitLockMode), so seems natural for it to set heldLocks too.
---
 src/backend/storage/lmgr/lock.c |  8 --------
 src/backend/storage/lmgr/proc.c | 10 ++++++----
 2 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 7eb5c2e5226..489073259b6 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1047,11 +1047,6 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	}
 	else
 	{
-		/*
-		 * Set bitmask of locks this process already holds on this object.
-		 */
-		MyProc->heldLocks = proclock->holdMask;
-
 		/*
 		 * Sleep till someone wakes me up. We do this even in the dontWait
 		 * case, because while trying to go to sleep, we may discover that we
@@ -1808,9 +1803,6 @@ MarkLockClear(LOCALLOCK *locallock)
 /*
  * WaitOnLock -- wait to acquire a lock
  *
- * Caller must have set MyProc->heldLocks to reflect locks already held
- * on the lockable object by this process.
- *
  * The appropriate partition lock must be held at entry, and will still be
  * held at exit.
  */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 1b23efb26f3..5c1d2792c3d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1040,9 +1040,6 @@ AuxiliaryPidGetProc(int pid)
 /*
  * ProcSleep -- put a process to sleep on the specified lock
  *
- * Caller must have set MyProc->heldLocks to reflect locks already held
- * on the lockable object by this process (under all XIDs).
- *
  * It's not actually guaranteed that we need to wait when this function is
  * called, because it could be that when we try to find a position at which
  * to insert ourself into the wait queue, we discover that we must be inserted
@@ -1072,7 +1069,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
 	dclist_head *waitQueue = &lock->waitProcs;
 	PGPROC	   *insert_before = NULL;
-	LOCKMASK	myHeldLocks = MyProc->heldLocks;
+	LOCKMASK	myHeldLocks;
 	TimestampTz standbyWaitStart = 0;
 	bool		early_deadlock = false;
 	bool		allow_autovacuum_cancel = true;
@@ -1080,6 +1077,11 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	ProcWaitStatus myWaitStatus;
 	PGPROC	   *leader = MyProc->lockGroupLeader;
 
+	/*
+	 * Set bitmask of locks this process already holds on this object.
+	 */
+	myHeldLocks = MyProc->heldLocks = proclock->holdMask;
+
 	/*
 	 * If group locking is in use, locks held by members of my locking group
 	 * need to be included in myHeldLocks.  This is not required for relation
-- 
2.39.2

From 05b8b4a7222795377f0a967b6408e61aa6b8c62f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 13:12:57 +0300
Subject: [PATCH 4/8] Move TRACE calls into WaitOnLock()

LockAcquire is a long and complex function. Pushing more stuff to its
subroutines makes it a little more manageable.
---
 src/backend/storage/lmgr/lock.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 489073259b6..bac950efdf3 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1052,23 +1052,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		 * case, because while trying to go to sleep, we may discover that we
 		 * can acquire the lock immediately after all.
 		 */
-
-		TRACE_POSTGRESQL_LOCK_WAIT_START(locktag->locktag_field1,
-										 locktag->locktag_field2,
-										 locktag->locktag_field3,
-										 locktag->locktag_field4,
-										 locktag->locktag_type,
-										 lockmode);
-
 		WaitOnLock(locallock, owner, dontWait);
 
-		TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
-										locktag->locktag_field2,
-										locktag->locktag_field3,
-										locktag->locktag_field4,
-										locktag->locktag_type,
-										lockmode);
-
 		/*
 		 * NOTE: do not do any material change of state between here and
 		 * return.  All required changes in locktable state must have been
@@ -1812,6 +1797,13 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 	LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock);
 	LockMethod	lockMethodTable = LockMethods[lockmethodid];
 
+	TRACE_POSTGRESQL_LOCK_WAIT_START(locallock->tag.lock.locktag_field1,
+									 locallock->tag.lock.locktag_field2,
+									 locallock->tag.lock.locktag_field3,
+									 locallock->tag.lock.locktag_field4,
+									 locallock->tag.lock.locktag_type,
+									 locallock->tag.mode);
+
 	LOCK_PRINT("WaitOnLock: sleeping on lock",
 			   locallock->lock, locallock->tag.mode);
 
@@ -1882,6 +1874,13 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 
 	LOCK_PRINT("WaitOnLock: wakeup on lock",
 			   locallock->lock, locallock->tag.mode);
+
+	TRACE_POSTGRESQL_LOCK_WAIT_DONE(locallock->tag.lock.locktag_field1,
+									locallock->tag.lock.locktag_field2,
+									locallock->tag.lock.locktag_field3,
+									locallock->tag.lock.locktag_field4,
+									locallock->tag.lock.locktag_type,
+									locallock->tag.mode);
 }
 
 /*
-- 
2.39.2

From 6f70b1f3fad97a89725706ee974ad2665471c6d6 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 17:23:33 +0300
Subject: [PATCH 5/8] Remove redundant lockAwaited global variable

We had two global (static) variables to hold the LOCALLOCK that we are
currently acquiring: awaitedLock in lock.c and lockAwaited in
proc.c. They were set at slightly different times, but when they were
both set, they were the same thing. Remove the lockAwaited variable
and rely on awaitedLock everywhere.
---
 src/backend/storage/lmgr/lock.c |  9 +++++++++
 src/backend/storage/lmgr/proc.c | 30 ++++--------------------------
 src/backend/tcop/postgres.c     |  2 +-
 src/include/storage/lock.h      |  2 ++
 src/include/storage/proc.h      |  1 -
 5 files changed, 16 insertions(+), 28 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index bac950efdf3..9035145169e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1771,6 +1771,12 @@ GrantAwaitedLock(void)
 	GrantLockLocal(awaitedLock, awaitedOwner);
 }
 
+LOCALLOCK *
+GetAwaitedLock(void)
+{
+	return awaitedLock;
+}
+
 /*
  * MarkLockClear -- mark an acquired lock as "clear"
  *
@@ -1867,6 +1873,9 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 	}
 	PG_END_TRY();
 
+	/*
+	 * We no longer want LockErrorCleanup to do anything.
+	 */
 	awaitedLock = NULL;
 
 	/* reset ps display to remove the suffix */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5c1d2792c3d..7a207d0104d 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -79,9 +79,6 @@ PROC_HDR   *ProcGlobal = NULL;
 NON_EXEC_STATIC PGPROC *AuxiliaryProcs = NULL;
 PGPROC	   *PreparedXactProcs = NULL;
 
-/* If we are waiting for a lock, this points to the associated LOCALLOCK */
-static LOCALLOCK *lockAwaited = NULL;
-
 static DeadLockState deadlock_state = DS_NOT_YET_CHECKED;
 
 /* Is a deadlock check pending? */
@@ -706,18 +703,6 @@ HaveNFreeProcs(int n, int *nfree)
 	return (*nfree == n);
 }
 
-/*
- * Check if the current process is awaiting a lock.
- */
-bool
-IsWaitingForLock(void)
-{
-	if (lockAwaited == NULL)
-		return false;
-
-	return true;
-}
-
 /*
  * Cancel any pending wait for lock, when aborting a transaction, and revert
  * any strong lock count acquisition for a lock being acquired.
@@ -729,6 +714,7 @@ IsWaitingForLock(void)
 void
 LockErrorCleanup(void)
 {
+	LOCALLOCK  *lockAwaited;
 	LWLock	   *partitionLock;
 	DisableTimeoutParams timeouts[2];
 
@@ -737,6 +723,7 @@ LockErrorCleanup(void)
 	AbortStrongLockAcquire();
 
 	/* Nothing to do if we weren't waiting for a lock */
+	lockAwaited = GetAwaitedLock();
 	if (lockAwaited == NULL)
 	{
 		RESUME_INTERRUPTS();
@@ -1212,9 +1199,6 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 		return PROC_WAIT_STATUS_ERROR;
 	}
 
-	/* mark that we are waiting for a lock */
-	lockAwaited = locallock;
-
 	/*
 	 * Release the lock table's partition lock.
 	 *
@@ -1639,17 +1623,11 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 							NULL, false);
 
 	/*
-	 * Re-acquire the lock table's partition lock.  We have to do this to hold
-	 * off cancel/die interrupts before we can mess with lockAwaited (else we
-	 * might have a missed or duplicated locallock update).
+	 * Re-acquire the lock table's partition lock, because the caller expects
+	 * us to still hold it.
 	 */
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
-	/*
-	 * We no longer want LockErrorCleanup to do anything.
-	 */
-	lockAwaited = NULL;
-
 	/*
 	 * If we got the lock, be sure to remember it in the locallock table.
 	 */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ea517f4b9bb..fce65c7313e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3052,7 +3052,7 @@ ProcessRecoveryConflictInterrupt(ProcSignalReason reason)
 			/*
 			 * If we aren't waiting for a lock we can never deadlock.
 			 */
-			if (!IsWaitingForLock())
+			if (GetAwaitedLock() == NULL)
 				return;
 
 			/* Intentional fall through to check wait for pin */
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index cc1f6e78c39..45752ac1408 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -583,6 +583,8 @@ extern bool LockCheckConflicts(LockMethod lockMethodTable,
 							   LOCK *lock, PROCLOCK *proclock);
 extern void GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode);
 extern void GrantAwaitedLock(void);
+extern LOCALLOCK *GetAwaitedLock(void);
+
 extern void RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode);
 extern Size LockShmemSize(void);
 extern LockData *GetLockStatusData(void);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 7d3fc2bfa60..71e05727e54 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -476,7 +476,6 @@ extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock,
 extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus);
 extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock);
 extern void CheckDeadLockAlert(void);
-extern bool IsWaitingForLock(void);
 extern void LockErrorCleanup(void);
 
 extern void ProcWaitForSignal(uint32 wait_event_info);
-- 
2.39.2

From 7cb00c043d47490263aa67d2f1dfce9c40f02af0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 17:27:16 +0300
Subject: [PATCH 6/8] Update local lock table in ProcSleep's caller

ProcSleep is now responsible only for the shared state.  Seems a
little nicer that way.
---
 src/backend/storage/lmgr/lock.c | 4 +++-
 src/backend/storage/lmgr/proc.c | 7 -------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 9035145169e..8d1d57c0533 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1043,7 +1043,6 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	{
 		/* No conflict with held or previously requested locks */
 		GrantLock(lock, proclock, lockmode);
-		GrantLockLocal(locallock, owner);
 	}
 	else
 	{
@@ -1123,6 +1122,9 @@ LockAcquireExtended(const LOCKTAG *locktag,
 		LOCK_PRINT("LockAcquire: granted", lock, lockmode);
 	}
 
+	/* The lock was granted to us.  Update the local lock entry accordingly */
+	GrantLockLocal(locallock, owner);
+
 	/*
 	 * Lock state is fully up-to-date now; if we error out after this, no
 	 * special error cleanup is required.
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7a207d0104d..8e74a04b932 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1152,7 +1152,6 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 				{
 					/* Skip the wait and just grant myself the lock. */
 					GrantLock(lock, proclock, lockmode);
-					GrantAwaitedLock();
 					return PROC_WAIT_STATUS_OK;
 				}
 
@@ -1628,12 +1627,6 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	 */
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
 
-	/*
-	 * If we got the lock, be sure to remember it in the locallock table.
-	 */
-	if (MyProc->waitStatus == PROC_WAIT_STATUS_OK)
-		GrantAwaitedLock();
-
 	/*
 	 * We don't have to do anything else, because the awaker did all the
 	 * necessary update of the lock table and MyProc.
-- 
2.39.2

From 94804a2380d18107040887fa424919eeac97ba90 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 17:38:06 +0300
Subject: [PATCH 7/8] Release partition lock a little earlier

We don't need to hold the lock to prevent die/cancel interrupts, they
are only processed at explicit CHECK_FOR_INTERRUPTS() points now.
---
 src/backend/storage/lmgr/lock.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 8d1d57c0533..52f2867ffea 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1043,6 +1043,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	{
 		/* No conflict with held or previously requested locks */
 		GrantLock(lock, proclock, lockmode);
+		LWLockRelease(partitionLock);
 	}
 	else
 	{
@@ -1118,8 +1119,10 @@ LockAcquireExtended(const LOCKTAG *locktag,
 				elog(ERROR, "LockAcquire failed");
 			}
 		}
+
 		PROCLOCK_PRINT("LockAcquire: granted", proclock);
 		LOCK_PRINT("LockAcquire: granted", lock, lockmode);
+		LWLockRelease(partitionLock);
 	}
 
 	/* The lock was granted to us.  Update the local lock entry accordingly */
@@ -1131,8 +1134,6 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 */
 	FinishStrongLockAcquire();
 
-	LWLockRelease(partitionLock);
-
 	/*
 	 * Emit a WAL record if acquisition of this lock needs to be replayed in a
 	 * standby server.
-- 
2.39.2

From 0df6db8607c5d10d14ae4b7bbc5077ad394b4e6b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 22 Jul 2024 21:53:18 +0300
Subject: [PATCH 8/8] Split ProcSleep function into JoinWaitQueue and ProcSleep

Split ProcSleep into two functions: JoinWaitQueue and ProcSleep.
JoinWaitQueue is called while holding the partition lock, and inserts
the current process to the wait queue, while ProcSleep() does the
actual sleeping. ProcSleep() is now called without holding the
partition lock, and it no longer re-acquires the partition lock before
returning. That makes the wakeup a little cheaper. Once upon a time,
re-acquiring the partition lock was ago to prevent a signal handler
from longjmping out at a bad time, but nowadays our signal handlers
just set flags, and longjmping can only happen at points where we
explicitly run CHECK_FOR_INTERRUPTS().

If JoinWaitQueue detects an "early deadlock" before even joining the
wait queue, it returns without changing the shared lock entry, leaving
the cleanup of the shared lock entry to the caller. This makes early
deadlock behave the same as the dontWait=true case.

One small side-effect of this refactoring is that we now only set the
'ps' title to say "waiting" when we actually enter the sleep, not when
the lock is skipped because dontWait=true, or when a deadlock is
detected early before entering the sleep.

Based on Thomas Munro's earlier patch and observation that ProcSleep
doesn't really need to re-acquire the partition lock.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/lmgr/lock.c | 182 ++++++++++++++++----------------
 src/backend/storage/lmgr/proc.c | 107 +++++++++++--------
 src/include/storage/proc.h      |   6 +-
 3 files changed, 156 insertions(+), 139 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 52f2867ffea..c51c98f2665 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -360,8 +360,7 @@ static PROCLOCK *SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner);
 static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode);
 static void FinishStrongLockAcquire(void);
-static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner,
-					   bool dontWait);
+static ProcWaitStatus WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner);
 static void ReleaseLockIfHeld(LOCALLOCK *locallock, bool sessionLock);
 static void LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent);
 static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode,
@@ -1047,85 +1046,105 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	}
 	else
 	{
-		/*
-		 * Sleep till someone wakes me up. We do this even in the dontWait
-		 * case, because while trying to go to sleep, we may discover that we
-		 * can acquire the lock immediately after all.
-		 */
-		WaitOnLock(locallock, owner, dontWait);
+		ProcWaitStatus waitResult;
 
 		/*
-		 * NOTE: do not do any material change of state between here and
-		 * return.  All required changes in locktable state must have been
-		 * done when the lock was granted to us --- see notes in WaitOnLock.
+		 * Join the lock's wait queue.  We do this even in the dontWait case,
+		 * because while joining the queue, we may discover that we can
+		 * acquire the lock immediately after all.
 		 */
+		waitResult = JoinWaitQueue(locallock, lockMethodTable, dontWait);
 
-		/*
-		 * Check the proclock entry status. If dontWait = true, this is an
-		 * expected case; otherwise, it will only happen if something in the
-		 * ipc communication doesn't work correctly.
-		 */
-		if (!(proclock->holdMask & LOCKBIT_ON(lockmode)))
+		if (waitResult == PROC_WAIT_STATUS_ERROR)
 		{
+			/*
+			 * We're not getting the lock because a deadlock was detected
+			 * already while trying to join the wait queue, or because we
+			 * would have to wait but the caller requested no blocking.
+			 *
+			 * Undo the changes to shared entries before releasing the
+			 * partition lock.
+			 */
 			AbortStrongLockAcquire();
 
+			if (proclock->holdMask == 0)
+			{
+				uint32		proclock_hashcode;
+
+				proclock_hashcode = ProcLockHashCode(&proclock->tag,
+													 hashcode);
+				dlist_delete(&proclock->lockLink);
+				dlist_delete(&proclock->procLink);
+				if (!hash_search_with_hash_value(LockMethodProcLockHash,
+												 &(proclock->tag),
+												 proclock_hashcode,
+												 HASH_REMOVE,
+												 NULL))
+					elog(PANIC, "proclock table corrupted");
+			}
+			else
+				PROCLOCK_PRINT("LockAcquire: did not join wait queue", proclock);
+			lock->nRequested--;
+			lock->requested[lockmode]--;
+			LOCK_PRINT("LockAcquire: did not join wait queue",
+					   lock, lockmode);
+			Assert((lock->nRequested > 0) &&
+				   (lock->requested[lockmode] >= 0));
+			Assert(lock->nGranted <= lock->nRequested);
+			LWLockRelease(partitionLock);
+			if (locallock->nLocks == 0)
+				RemoveLocalLock(locallock);
+
 			if (dontWait)
 			{
-				/*
-				 * We can't acquire the lock immediately.  If caller specified
-				 * no blocking, remove useless table entries and return
-				 * LOCKACQUIRE_NOT_AVAIL without waiting.
-				 */
-				if (proclock->holdMask == 0)
-				{
-					uint32		proclock_hashcode;
-
-					proclock_hashcode = ProcLockHashCode(&proclock->tag,
-														 hashcode);
-					dlist_delete(&proclock->lockLink);
-					dlist_delete(&proclock->procLink);
-					if (!hash_search_with_hash_value(LockMethodProcLockHash,
-													 &(proclock->tag),
-													 proclock_hashcode,
-													 HASH_REMOVE,
-													 NULL))
-						elog(PANIC, "proclock table corrupted");
-				}
-				else
-					PROCLOCK_PRINT("LockAcquire: NOWAIT", proclock);
-				lock->nRequested--;
-				lock->requested[lockmode]--;
-				LOCK_PRINT("LockAcquire: conditional lock failed",
-						   lock, lockmode);
-				Assert((lock->nRequested > 0) &&
-					   (lock->requested[lockmode] >= 0));
-				Assert(lock->nGranted <= lock->nRequested);
-				LWLockRelease(partitionLock);
-				if (locallock->nLocks == 0)
-					RemoveLocalLock(locallock);
 				if (locallockp)
 					*locallockp = NULL;
 				return LOCKACQUIRE_NOT_AVAIL;
 			}
 			else
+			{
+				DeadLockReport();
+				/* DeadLockReport() will not return */
+			}
+		}
+
+		/*
+		 * We are now in the lock queue, or the lock was already granted.  If
+		 * queued, go to sleep.
+		 */
+		if (waitResult == PROC_WAIT_STATUS_WAITING)
+		{
+			Assert(!dontWait);
+			PROCLOCK_PRINT("LockAcquire: sleeping on lock", proclock);
+			LOCK_PRINT("LockAcquire: sleeping on lock", lock, lockmode);
+			LWLockRelease(partitionLock);
+
+			waitResult = WaitOnLock(locallock, owner);
+
+			/*
+			 * NOTE: do not do any material change of state between here and
+			 * return.  All required changes in locktable state must have been
+			 * done when the lock was granted to us --- see notes in WaitOnLock.
+			 */
+
+			if (waitResult == PROC_WAIT_STATUS_ERROR)
 			{
 				/*
-				 * We should have gotten the lock, but somehow that didn't
-				 * happen. If we get here, it's a bug.
+				 * We failed as a result of a deadlock, see CheckDeadLock(). Quit
+				 * now.
 				 */
-				PROCLOCK_PRINT("LockAcquire: INCONSISTENT", proclock);
-				LOCK_PRINT("LockAcquire: INCONSISTENT", lock, lockmode);
-				LWLockRelease(partitionLock);
-				elog(ERROR, "LockAcquire failed");
+				Assert(!dontWait);
+				DeadLockReport();
+				/* DeadLockReport() will not return */
 			}
 		}
-
-		PROCLOCK_PRINT("LockAcquire: granted", proclock);
-		LOCK_PRINT("LockAcquire: granted", lock, lockmode);
-		LWLockRelease(partitionLock);
+		else
+			LWLockRelease(partitionLock);
+		Assert(waitResult == PROC_WAIT_STATUS_OK);
 	}
 
 	/* The lock was granted to us.  Update the local lock entry accordingly */
+	Assert((proclock->holdMask & LOCKBIT_ON(lockmode)) != 0);
 	GrantLockLocal(locallock, owner);
 
 	/*
@@ -1797,14 +1816,12 @@ MarkLockClear(LOCALLOCK *locallock)
 /*
  * WaitOnLock -- wait to acquire a lock
  *
- * The appropriate partition lock must be held at entry, and will still be
- * held at exit.
+ * This is a wrapper around ProcSleep, with extra tracing and bookkeeping.
  */
-static void
-WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
+static ProcWaitStatus
+WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
 {
-	LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock);
-	LockMethod	lockMethodTable = LockMethods[lockmethodid];
+	ProcWaitStatus result;
 
 	TRACE_POSTGRESQL_LOCK_WAIT_START(locallock->tag.lock.locktag_field1,
 									 locallock->tag.lock.locktag_field2,
@@ -1813,12 +1830,13 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 									 locallock->tag.lock.locktag_type,
 									 locallock->tag.mode);
 
-	LOCK_PRINT("WaitOnLock: sleeping on lock",
-			   locallock->lock, locallock->tag.mode);
-
 	/* adjust the process title to indicate that it's waiting */
 	set_ps_display_suffix("waiting");
 
+	/*
+	 * Record the fact that we are waiting for a lock, so that
+	 * LockErrorCleanup will clean up if cancel/die happens.
+	 */
 	awaitedLock = locallock;
 	awaitedOwner = owner;
 
@@ -1841,28 +1859,7 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 	 */
 	PG_TRY();
 	{
-		/*
-		 * If dontWait = true, we handle success and failure in the same way
-		 * here. The caller will be able to sort out what has happened.
-		 */
-		if (ProcSleep(locallock, lockMethodTable, dontWait) != PROC_WAIT_STATUS_OK
-			&& !dontWait)
-		{
-
-			/*
-			 * We failed as a result of a deadlock, see CheckDeadLock(). Quit
-			 * now.
-			 */
-			awaitedLock = NULL;
-			LWLockRelease(LockHashPartitionLock(locallock->hashcode));
-
-			/*
-			 * Now that we aren't holding the partition lock, we can give an
-			 * error report including details about the detected deadlock.
-			 */
-			DeadLockReport();
-			/* not reached */
-		}
+		result = ProcSleep(locallock);
 	}
 	PG_CATCH();
 	{
@@ -1884,15 +1881,14 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, bool dontWait)
 	/* reset ps display to remove the suffix */
 	set_ps_display_remove_suffix();
 
-	LOCK_PRINT("WaitOnLock: wakeup on lock",
-			   locallock->lock, locallock->tag.mode);
-
 	TRACE_POSTGRESQL_LOCK_WAIT_DONE(locallock->tag.lock.locktag_field1,
 									locallock->tag.lock.locktag_field2,
 									locallock->tag.lock.locktag_field3,
 									locallock->tag.lock.locktag_field4,
 									locallock->tag.lock.locktag_type,
 									locallock->tag.mode);
+
+	return result;
 }
 
 /*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 8e74a04b932..6d0adbb8018 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1025,7 +1025,7 @@ AuxiliaryPidGetProc(int pid)
 
 
 /*
- * ProcSleep -- put a process to sleep on the specified lock
+ * JoinWaitQueue -- join the wait queue on the specified lock
  *
  * It's not actually guaranteed that we need to wait when this function is
  * called, because it could be that when we try to find a position at which
@@ -1034,20 +1034,24 @@ AuxiliaryPidGetProc(int pid)
  * we get the lock immediately. Because of this, it's sensible for this function
  * to have a dontWait argument, despite the name.
  *
- * The lock table's partition lock must be held at entry, and will be held
- * at exit.
+ * On entry, the caller has already set up LOCK and PROCLOCK entries to
+ * reflect that we have "requested" the lock.  The caller is responsible for
+ * cleaning that up, if we end up not joining the queue after all.
  *
- * Result: PROC_WAIT_STATUS_OK if we acquired the lock, PROC_WAIT_STATUS_ERROR
- * if not (if dontWait = true, we would have had to wait; if dontWait = false,
- * this is a deadlock).
+ * The lock table's partition lock must be held at entry, and is still held
+ * at exit.  The caller must release it before calling ProcSleep().
  *
- * ASSUME: that no one will fiddle with the queue until after
- *		we release the partition lock.
+ * Result is one of the following:
+ *
+ *  PROC_WAIT_STATUS_OK       - lock was immediately granted
+ *  PROC_WAIT_STATUS_WAITING  - joined the wait queue; call ProcSleep()
+ *  PROC_WAIT_STATUS_ERROR    - immediate deadlock was detected, or would
+ *                              need to wait and dontWait == true
  *
  * NOTES: The process queue is now a priority queue for locking.
  */
 ProcWaitStatus
-ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
+JoinWaitQueue(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 {
 	LOCKMODE	lockmode = locallock->tag.mode;
 	LOCK	   *lock = locallock->lock;
@@ -1057,13 +1061,16 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 	dclist_head *waitQueue = &lock->waitProcs;
 	PGPROC	   *insert_before = NULL;
 	LOCKMASK	myHeldLocks;
-	TimestampTz standbyWaitStart = 0;
 	bool		early_deadlock = false;
-	bool		allow_autovacuum_cancel = true;
-	bool		logged_recovery_conflict = false;
-	ProcWaitStatus myWaitStatus;
 	PGPROC	   *leader = MyProc->lockGroupLeader;
 
+	Assert(LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
+
+	/*
+	 * Set bitmask of locks this process already holds on this object.
+	 */
+	myHeldLocks = MyProc->heldLocks = proclock->holdMask;
+
 	/*
 	 * Set bitmask of locks this process already holds on this object.
 	 */
@@ -1164,6 +1171,13 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 		}
 	}
 
+	/*
+	 * If we detected deadlock, give up without waiting.  This must agree with
+	 * CheckDeadLock's recovery code.
+	 */
+	if (early_deadlock)
+		return PROC_WAIT_STATUS_ERROR;
+
 	/*
 	 * At this point we know that we'd really need to sleep. If we've been
 	 * commanded not to do that, bail out.
@@ -1188,31 +1202,43 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 
 	MyProc->waitStatus = PROC_WAIT_STATUS_WAITING;
 
-	/*
-	 * If we detected deadlock, give up without waiting.  This must agree with
-	 * CheckDeadLock's recovery code.
-	 */
-	if (early_deadlock)
-	{
-		RemoveFromWaitQueue(MyProc, hashcode);
-		return PROC_WAIT_STATUS_ERROR;
-	}
+	return PROC_WAIT_STATUS_WAITING;
+}
 
-	/*
-	 * Release the lock table's partition lock.
-	 *
-	 * NOTE: this may also cause us to exit critical-section state, possibly
-	 * allowing a cancel/die interrupt to be accepted. This is OK because we
-	 * have recorded the fact that we are waiting for a lock, and so
-	 * LockErrorCleanup will clean up if cancel/die happens.
-	 */
-	LWLockRelease(partitionLock);
+/*
+ * ProcSleep -- put process to sleep waiting on lock
+ *
+ * This must be called when JoinWaitQueue() returns PROC_WAIT_STATUS_WAITING.
+ * Returns after the lock has been granted, or if a deadlock is detected.  Can
+ * also bail out with ereport(ERROR), if some other error condition, or a
+ * timeout or cancellation is triggered.
+ *
+ * Result is one of the following:
+ *
+ *  PROC_WAIT_STATUS_OK      - lock was granted
+ *  PROC_WAIT_STATUS_ERROR   - a deadlock was detected
+ */
+ProcWaitStatus
+ProcSleep(LOCALLOCK *locallock)
+{
+	LOCKMODE	lockmode = locallock->tag.mode;
+	LOCK	   *lock = locallock->lock;
+	uint32		hashcode = locallock->hashcode;
+	LWLock	   *partitionLock = LockHashPartitionLock(hashcode);
+	TimestampTz standbyWaitStart = 0;
+	bool		allow_autovacuum_cancel = true;
+	bool		logged_recovery_conflict = false;
+	ProcWaitStatus myWaitStatus;
+
+	/* The caller must've armed the on-error cleanup mechanism */
+	Assert(GetAwaitedLock() == locallock);
+	Assert(!LWLockHeldByMe(partitionLock));
 
 	/*
-	 * Also, now that we will successfully clean up after an ereport, it's
-	 * safe to check to see if there's a buffer pin deadlock against the
-	 * Startup process.  Of course, that's only necessary if we're doing Hot
-	 * Standby and are not the Startup process ourselves.
+	 * Now that we will successfully clean up after an ereport, it's safe to
+	 * check to see if there's a buffer pin deadlock against the Startup
+	 * process.  Of course, that's only necessary if we're doing Hot Standby
+	 * and are not the Startup process ourselves.
 	 */
 	if (RecoveryInProgress() && !InRecovery)
 		CheckRecoveryConflictDeadlock();
@@ -1621,17 +1647,12 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, bool dontWait)
 							standbyWaitStart, GetCurrentTimestamp(),
 							NULL, false);
 
-	/*
-	 * Re-acquire the lock table's partition lock, because the caller expects
-	 * us to still hold it.
-	 */
-	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
-
 	/*
 	 * We don't have to do anything else, because the awaker did all the
-	 * necessary update of the lock table and MyProc.
+	 * necessary updates of the lock table and MyProc. (The caller is
+	 * responsible for updating the local lock table.)
 	 */
-	return MyProc->waitStatus;
+	return myWaitStatus;
 }
 
 
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 71e05727e54..89dd7578377 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -470,9 +470,9 @@ extern int	GetStartupBufferPinWaitBufId(void);
 extern bool HaveNFreeProcs(int n, int *nfree);
 extern void ProcReleaseLocks(bool isCommit);
 
-extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock,
-								LockMethod lockMethodTable,
-								bool dontWait);
+extern ProcWaitStatus JoinWaitQueue(LOCALLOCK *locallock,
+									LockMethod lockMethodTable, bool dontWait);
+extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock);
 extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus);
 extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock);
 extern void CheckDeadLockAlert(void);
-- 
2.39.2

Reply via email to