I wrote:
> I don't think it's correct to re-initialize the SpinDelayStatus each
> time around the outer loop.  That state should persist through the
> entire acquire operation, as it does in a regular spinlock acquire.
> As this stands, it resets the delay to minimum each time around the
> outer loop, and I bet it is that behavior not the RNG that's to blame
> for what he's seeing.

After thinking about this some more, it is fairly clear that that *is*
a mistake that can cause a thundering-herd problem.  Assume we have
two or more backends waiting in perform_spin_delay, and for whatever
reason the scheduler wakes them up simultaneously.  They see the
LW_FLAG_LOCKED bit clear (because whoever had the lock when they went
to sleep is long gone) and iterate the outer loop.  One gets the lock;
the rest go back to sleep.  And they will all sleep exactly
MIN_DELAY_USEC, because they all reset their SpinDelayStatus.
Lather, rinse, repeat.  If the s_lock code were being used as
intended, they would soon acquire different cur_delay settings;
but that never happens.  That is not the RNG's fault.

So I think we need something like the attached.

                        regards, tom lane

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index b1e388dc7c..54c84b9d67 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -857,46 +857,49 @@ static void
 LWLockWaitListLock(LWLock *lock)
 {
 	uint32		old_state;
-#ifdef LWLOCK_STATS
-	lwlock_stats *lwstats;
-	uint32		delays = 0;
 
-	lwstats = get_lwlock_stats_entry(lock);
-#endif
+	/*
+	 * Try once to acquire the lock, before we go to the trouble of setting up
+	 * state for the spin loop.
+	 */
+	old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+	if (!(old_state & LW_FLAG_LOCKED))
+		return;					/* got lock */
 
-	while (true)
 	{
-		/* always try once to acquire lock directly */
-		old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
-		if (!(old_state & LW_FLAG_LOCKED))
-			break;				/* got lock */
+		SpinDelayStatus delayStatus;
+#ifdef LWLOCK_STATS
+		lwlock_stats *lwstats = get_lwlock_stats_entry(lock);
+#endif
 
-		/* and then spin without atomic operations until lock is released */
-		{
-			SpinDelayStatus delayStatus;
+		init_local_spin_delay(&delayStatus);
 
-			init_local_spin_delay(&delayStatus);
+		while (true)
+		{
+			/* try again to acquire lock directly */
+			old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
+			if (!(old_state & LW_FLAG_LOCKED))
+				break;			/* got lock */
 
+			/* spin without atomic operations until lock is released */
 			while (old_state & LW_FLAG_LOCKED)
 			{
 				perform_spin_delay(&delayStatus);
 				old_state = pg_atomic_read_u32(&lock->state);
 			}
-#ifdef LWLOCK_STATS
-			delays += delayStatus.delays;
-#endif
-			finish_spin_delay(&delayStatus);
+
+			/*
+			 * Retry. The lock might obviously already be re-acquired by the
+			 * time we're attempting to get it again.
+			 */
 		}
 
-		/*
-		 * Retry. The lock might obviously already be re-acquired by the time
-		 * we're attempting to get it again.
-		 */
-	}
+		finish_spin_delay(&delayStatus);
 
 #ifdef LWLOCK_STATS
-	lwstats->spin_delay_count += delays;
+		lwstats->spin_delay_count += delayStatus.delays;
 #endif
+	}
 }
 
 /*

Reply via email to