On 2020-06-05 17:19:26 -0700, Andres Freund wrote:
> Hi,
>
> On 2020-06-04 19:33:02 -0700, Andres Freund wrote:
> > But it looks like that code is currently buggy (and looks like it always
> > has been), because we don't look at the nested argument when
> > initializing the semaphore.  So we currently allocate too many
> > semaphores, without benefiting from them :(.
>
> I wrote a patch for this, and when I got around to to testing it, I
> found that our tests currently don't pass when using both
> --disable-spinlocks and --disable-atomics. Turns out to not be related
> to the issue above, but the global barrier support added in 13.
>
> That *reads* two 64 bit atomics in a signal handler. Which is normally
> fine, but not at all cool when atomics (or just 64 bit atomics) are
> backed by spinlocks. Because we can "self interrupt" while already
> holding the spinlock.
>
> It looks to me that that's a danger whenever 64bit atomics are backed by
> spinlocks, not just when both --disable-spinlocks and --disable-atomics
> are used. But I suspect that it's really hard to hit the tiny window of
> danger when those options aren't used. While we have buildfarm animals
> testing each of those separately, we don't have one that tests both
> together...
>
> I'm not really sure what to do about that issue. The easisest thing
> would probably be to change the barrier generation to 32bit (which
> doesn't have to use locks for reads in any situation).  I tested doing
> that, and it fixes the hangs for me.
>
>
> Randomly noticed while looking at the code:
>       uint64          flagbit = UINT64CONST(1) << (uint64) type;
>
> that shouldn't be 64bit, right?

Attached is a series of patches addressing these issues, of varying
quality:

1) This fixes the above mentioned issue in the global barrier code by
   using 32bit atomics. That might be fine, or it might not. I just
   included it here because otherwise the tests cannot be run fully.


2) Fixes spinlock emulation when more than INT_MAX spinlocks are
   initialized in the lifetime of a single backend

3) Add spinlock tests to normal regression tests.
   - Currently as part of test_atomic_ops. Probably not worth having a
     separate SQL function?
   - Currently contains a test for 1) that's run when the spinlock
     emulation is used. Probably too slow to actually indclude? Takes 15s
     on my computer... OTOH, it's just with --disable-spinlocks...
   - Could probably remove the current spinlock tests after this. The
     only thing they additionally test is a stuck spinlock. Since
     they're not run currently, they don't seem worth much?

4) Fix the potential for deadlocks when using atomics while holding a
   spinlock, add tests for that.

Any comments?

Greetings,

Andres Freund
>From 17a7730c282018d02e3d82990f4fd84e0d2f6ce0 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 8 Jun 2020 17:10:26 -0700
Subject: [PATCH v1 1/4] WORKAROUND: Avoid spinlock use in signal handler by
 using 32bit generations.

As 64bit atomic reads aren't aren't guaranteed to be atomic on all
platforms we can end up with spinlocks being used inside signal
handlers. Which obviously is dangerous, as that can easily create self
deadlocks.

32bit atomics don't have that problem. But 32bit generations might not
be enough in all situations (?). So this possibly isn't the right
fix.

The problem can easily be reproduced by running make check in a
--disable-spinlocks --disable-atomics build.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/include/storage/procsignal.h     |  4 +--
 src/backend/storage/ipc/procsignal.c | 48 ++++++++++++++--------------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/src/include/storage/procsignal.h b/src/include/storage/procsignal.h
index a0c0bc3ce55..c5a7e362dd9 100644
--- a/src/include/storage/procsignal.h
+++ b/src/include/storage/procsignal.h
@@ -65,8 +65,8 @@ extern void ProcSignalInit(int pss_idx);
 extern int	SendProcSignal(pid_t pid, ProcSignalReason reason,
 						   BackendId backendId);
 
-extern uint64 EmitProcSignalBarrier(ProcSignalBarrierType type);
-extern void WaitForProcSignalBarrier(uint64 generation);
+extern uint32 EmitProcSignalBarrier(ProcSignalBarrierType type);
+extern void WaitForProcSignalBarrier(uint32 generation);
 extern void ProcessProcSignalBarrier(void);
 
 extern void procsignal_sigusr1_handler(SIGNAL_ARGS);
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c809196d06a..eba424e15a7 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -60,7 +60,7 @@ typedef struct
 {
 	pid_t		pss_pid;
 	sig_atomic_t pss_signalFlags[NUM_PROCSIGNALS];
-	pg_atomic_uint64 pss_barrierGeneration;
+	pg_atomic_uint32 pss_barrierGeneration;
 	pg_atomic_uint32 pss_barrierCheckMask;
 } ProcSignalSlot;
 
@@ -72,7 +72,7 @@ typedef struct
  */
 typedef struct
 {
-	pg_atomic_uint64 psh_barrierGeneration;
+	pg_atomic_uint32 psh_barrierGeneration;
 	ProcSignalSlot psh_slot[FLEXIBLE_ARRAY_MEMBER];
 } ProcSignalHeader;
 
@@ -126,7 +126,7 @@ ProcSignalShmemInit(void)
 	{
 		int			i;
 
-		pg_atomic_init_u64(&ProcSignal->psh_barrierGeneration, 0);
+		pg_atomic_init_u32(&ProcSignal->psh_barrierGeneration, 0);
 
 		for (i = 0; i < NumProcSignalSlots; ++i)
 		{
@@ -134,7 +134,7 @@ ProcSignalShmemInit(void)
 
 			slot->pss_pid = 0;
 			MemSet(slot->pss_signalFlags, 0, sizeof(slot->pss_signalFlags));
-			pg_atomic_init_u64(&slot->pss_barrierGeneration, PG_UINT64_MAX);
+			pg_atomic_init_u32(&slot->pss_barrierGeneration, PG_UINT32_MAX);
 			pg_atomic_init_u32(&slot->pss_barrierCheckMask, 0);
 		}
 	}
@@ -151,7 +151,7 @@ void
 ProcSignalInit(int pss_idx)
 {
 	volatile ProcSignalSlot *slot;
-	uint64		barrier_generation;
+	uint32		barrier_generation;
 
 	Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
 
@@ -178,8 +178,8 @@ ProcSignalInit(int pss_idx)
 	 */
 	pg_atomic_write_u32(&slot->pss_barrierCheckMask, 0);
 	barrier_generation =
-		pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration);
-	pg_atomic_write_u64(&slot->pss_barrierGeneration, barrier_generation);
+		pg_atomic_read_u32(&ProcSignal->psh_barrierGeneration);
+	pg_atomic_write_u32(&slot->pss_barrierGeneration, barrier_generation);
 	pg_memory_barrier();
 
 	/* Mark slot with my PID */
@@ -230,7 +230,7 @@ CleanupProcSignalState(int status, Datum arg)
 	 * Make this slot look like it's absorbed all possible barriers, so that
 	 * no barrier waits block on it.
 	 */
-	pg_atomic_write_u64(&slot->pss_barrierGeneration, PG_UINT64_MAX);
+	pg_atomic_write_u32(&slot->pss_barrierGeneration, PG_UINT32_MAX);
 
 	slot->pss_pid = 0;
 }
@@ -317,11 +317,11 @@ SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
  * Callers are entitled to assume that this function will not throw ERROR
  * or FATAL.
  */
-uint64
+uint32
 EmitProcSignalBarrier(ProcSignalBarrierType type)
 {
-	uint64		flagbit = UINT64CONST(1) << (uint64) type;
-	uint64		generation;
+	uint32		flagbit = 1 << (uint32) type;
+	uint32		generation;
 
 	/*
 	 * Set all the flags.
@@ -329,7 +329,7 @@ EmitProcSignalBarrier(ProcSignalBarrierType type)
 	 * Note that pg_atomic_fetch_or_u32 has full barrier semantics, so this is
 	 * totally ordered with respect to anything the caller did before, and
 	 * anything that we do afterwards. (This is also true of the later call to
-	 * pg_atomic_add_fetch_u64.)
+	 * pg_atomic_add_fetch_u32.)
 	 */
 	for (int i = 0; i < NumProcSignalSlots; i++)
 	{
@@ -342,7 +342,7 @@ EmitProcSignalBarrier(ProcSignalBarrierType type)
 	 * Increment the generation counter.
 	 */
 	generation =
-		pg_atomic_add_fetch_u64(&ProcSignal->psh_barrierGeneration, 1);
+		pg_atomic_add_fetch_u32(&ProcSignal->psh_barrierGeneration, 1);
 
 	/*
 	 * Signal all the processes, so that they update their advertised barrier
@@ -379,16 +379,16 @@ EmitProcSignalBarrier(ProcSignalBarrierType type)
  * 1 second.
  */
 void
-WaitForProcSignalBarrier(uint64 generation)
+WaitForProcSignalBarrier(uint32 generation)
 {
 	long		timeout = 125L;
 
 	for (int i = NumProcSignalSlots - 1; i >= 0; i--)
 	{
 		volatile ProcSignalSlot *slot = &ProcSignal->psh_slot[i];
-		uint64		oldval;
+		uint32		oldval;
 
-		oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
+		oldval = pg_atomic_read_u32(&slot->pss_barrierGeneration);
 		while (oldval < generation)
 		{
 			int			events;
@@ -401,7 +401,7 @@ WaitForProcSignalBarrier(uint64 generation)
 						  timeout, WAIT_EVENT_PROC_SIGNAL_BARRIER);
 			ResetLatch(MyLatch);
 
-			oldval = pg_atomic_read_u64(&slot->pss_barrierGeneration);
+			oldval = pg_atomic_read_u32(&slot->pss_barrierGeneration);
 			if (events & WL_TIMEOUT)
 				timeout = Min(timeout * 2, 1000L);
 		}
@@ -428,7 +428,7 @@ WaitForProcSignalBarrier(uint64 generation)
 void
 ProcessProcSignalBarrier(void)
 {
-	uint64		generation;
+	uint32		generation;
 	uint32		flags;
 
 	/* Exit quickly if there's no work to do. */
@@ -443,7 +443,7 @@ ProcessProcSignalBarrier(void)
 	 * happens before we atomically extract the flags, and that any subsequent
 	 * state changes happen afterward.
 	 */
-	generation = pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration);
+	generation = pg_atomic_read_u32(&ProcSignal->psh_barrierGeneration);
 	flags = pg_atomic_exchange_u32(&MyProcSignalSlot->pss_barrierCheckMask, 0);
 
 	/*
@@ -466,7 +466,7 @@ ProcessProcSignalBarrier(void)
 	 * things have changed further, it'll get fixed up when this function is
 	 * next called.
 	 */
-	pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierGeneration, generation);
+	pg_atomic_write_u32(&MyProcSignalSlot->pss_barrierGeneration, generation);
 }
 
 static void
@@ -515,11 +515,11 @@ CheckProcSignalBarrier(void)
 
 	if (slot != NULL)
 	{
-		uint64		mygen;
-		uint64		curgen;
+		uint32		mygen;
+		uint32		curgen;
 
-		mygen = pg_atomic_read_u64(&slot->pss_barrierGeneration);
-		curgen = pg_atomic_read_u64(&ProcSignal->psh_barrierGeneration);
+		mygen = pg_atomic_read_u32(&slot->pss_barrierGeneration);
+		curgen = pg_atomic_read_u32(&ProcSignal->psh_barrierGeneration);
 		return (mygen != curgen);
 	}
 
-- 
2.25.0.114.g5b0ca878e0

>From 23ded090b1ee9d91ea4455ddde30427c3756c78a Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 8 Jun 2020 15:25:49 -0700
Subject: [PATCH v1 2/4] spinlock emulation: Fix bug when more than INT_MAX
 spinlocks are initialized.

Once the counter goes negative we ended up with spinlocks that errored
out on first use (due to check in tas_sema).

Author: Andres Freund
Discussion: https://postgr.es/m/20200606023103.avzrctgv7476x...@alap3.anarazel.de
Backpatch: 9.5-
---
 src/backend/storage/lmgr/spin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 4d2a4c6641a..753943e46d6 100644
--- a/src/backend/storage/lmgr/spin.c
+++ b/src/backend/storage/lmgr/spin.c
@@ -106,7 +106,7 @@ SpinlockSemaInit(void)
 void
 s_init_lock_sema(volatile slock_t *lock, bool nested)
 {
-	static int	counter = 0;
+	static uint32 counter = 0;
 
 	*lock = ((++counter) % NUM_SPINLOCK_SEMAPHORES) + 1;
 }
-- 
2.25.0.114.g5b0ca878e0

>From 7eb9faf6b33792a22eccc4dd6f5fc6727c17fb41 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 8 Jun 2020 16:36:51 -0700
Subject: [PATCH v1 3/4] Add basic spinlock tests to regression tests.

As s_lock_test, the already existing test for spinlocks, isn't run in
an automated fashion (and doesn't test a normal backend environment),
adding tests that are run as part of a normal regression run is a good
idea.

Currently the new tests are run as part of the pre-existing
test_atomic_ops() test. That can probably be quibbled about.

The only operations that s_lock_test tests but the new tests don't are
the detection of a stuck spinlock and S_LOCK_FREE (which is otherwise
unused).

This currently contains a test for more than INT_MAX spinlocks (only
run with --disable-spinlocks), to ensure the previous commit fixing a
bug with more than INT_MAX spinlock initializations is correct, but
that might be too slow to enable generally.

It might be worth retiring s_lock_test after this. The added coverage
of a stuck spinlock probably isn't worth the added complexity?

Author: Andres Freund
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/test/regress/regress.c | 89 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 960c155e5f2..a48f9de2532 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -794,6 +794,92 @@ test_atomic_uint64(void)
 	EXPECT_EQ_U64(pg_atomic_fetch_and_u64(&var, ~0), 0);
 }
 
+static void
+test_spinlock(void)
+{
+	{
+		struct test_lock_struct
+		{
+			uint32		data_before;
+			slock_t		lock;
+			uint32		data_after;
+		} struct_w_lock;
+
+		struct_w_lock.data_before = 0x44;
+		struct_w_lock.data_after = 0x17;
+
+		/* test basic operations via the SpinLock* API */
+		SpinLockInit(&struct_w_lock.lock);
+		SpinLockAcquire(&struct_w_lock.lock);
+		SpinLockRelease(&struct_w_lock.lock);
+
+		/* test basic operations via underlying S_* API */
+		S_INIT_LOCK(&struct_w_lock.lock);
+		S_LOCK(&struct_w_lock.lock);
+		S_UNLOCK(&struct_w_lock.lock);
+
+		/* and that "contended" acquisition works */
+		s_lock(&struct_w_lock.lock, "testfile", 17, "testfunc");
+		S_UNLOCK(&struct_w_lock.lock);
+
+		/*
+		 * Check, using TAS directly, that a single spin cyle doesn't block
+		 * when acquiring an already acquired lock.
+		 */
+#ifdef TAS
+		S_LOCK(&struct_w_lock.lock);
+		if (!TAS(&struct_w_lock.lock))
+			elog(ERROR, "acquired already held spinlock");
+
+#ifdef TAS_SPIN
+		if (!TAS_SPIN(&struct_w_lock.lock))
+			elog(ERROR, "acquired already held spinlock");
+#endif							/* defined(TAS_SPIN) */
+
+		S_UNLOCK(&struct_w_lock.lock);
+#endif							/* defined(TAS) */
+
+		/*
+		 * Verify that after all of this the non-lock contents are still
+		 * correct.
+		 */
+		EXPECT_EQ_U32(struct_w_lock.data_before, 0x44);
+		EXPECT_EQ_U32(struct_w_lock.data_after, 0x17);
+	}
+
+	/*
+	 * Ensure that allocating more than INT32_MAX simulated spinlocks
+	 * works. This is probably too expensive to run each regression test.
+	 */
+#ifndef HAVE_SPINLOCKS
+	{
+		/*
+		 * Initialize enough spinlocks to advance counter close to
+		 * wraparound. It's too expensive to perform acquire/release for each,
+		 * as those may be syscalls when the spinlock emulation is used (and
+		 * even just atomic TAS would be expensive).
+		 */
+		for (uint32 i = 0; i < INT32_MAX - 100000; i++)
+		{
+			slock_t lock;
+
+			SpinLockInit(&lock);
+		}
+
+		for (uint32 i = 0; i < 200000; i++)
+		{
+			slock_t lock;
+
+			SpinLockInit(&lock);
+
+			SpinLockAcquire(&lock);
+			SpinLockRelease(&lock);
+			SpinLockAcquire(&lock);
+			SpinLockRelease(&lock);
+		}
+	}
+#endif
+}
 
 PG_FUNCTION_INFO_V1(test_atomic_ops);
 Datum
@@ -805,6 +891,9 @@ test_atomic_ops(PG_FUNCTION_ARGS)
 
 	test_atomic_uint64();
 
+	/* XXX: Is there a better location for this? */
+	test_spinlock();
+
 	PG_RETURN_BOOL(true);
 }
 
-- 
2.25.0.114.g5b0ca878e0

>From 1d313fd848882c97651e3d7592e71496bbbcc339 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 8 Jun 2020 16:50:37 -0700
Subject: [PATCH v1 4/4] Fix deadlock danger when atomic ops are done under
 spinlock.

This was a danger only for --disable-spinlocks in combination with
atomic operations unsupported by the current platform.

While atomics.c was careful to signal that a separate semaphore ought
to be used when spinlock emulation is active, spin.c didn't actually
implement that mechanism. That's my (Andres') fault, it seems to have
gotten lost during the development of the atomic operations support.

Fix that issue and add test for nesting atomic operations inside a
spinlock.

Author: Andres Freund
Discussion: https://postgr.es/m/20200605023302.g6v3ydozy5txi...@alap3.anarazel.de
Backpatch: 9.5-
---
 src/backend/storage/lmgr/spin.c | 95 +++++++++++++++++++++++----------
 src/test/regress/regress.c      | 43 +++++++++++++++
 2 files changed, 109 insertions(+), 29 deletions(-)

diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 753943e46d6..141606496eb 100644
--- a/src/backend/storage/lmgr/spin.c
+++ b/src/backend/storage/lmgr/spin.c
@@ -28,8 +28,24 @@
 
 
 #ifndef HAVE_SPINLOCKS
+
+/*
+ * No TAS, so spinlocks are implemented as PGSemaphores.
+ */
+
+#ifndef HAVE_ATOMICS
+#define NUM_SIMULATION_SEMAPHORES (NUM_SPINLOCK_SEMAPHORES + NUM_ATOMICS_SEMAPHORES)
+#else
+#define NUM_SIMULATION_SEMAPHORES (NUM_SPINLOCK_SEMAPHORES)
+#endif /* DISABLE_ATOMICS */
+
 PGSemaphore *SpinlockSemaArray;
-#endif
+
+#else							/* !HAVE_SPINLOCKS */
+
+#define NUM_SIMULATION_SEMAPHORES 0
+
+#endif							/* HAVE_SPINLOCKS */
 
 /*
  * Report the amount of shared memory needed to store semaphores for spinlock
@@ -38,34 +54,19 @@ PGSemaphore *SpinlockSemaArray;
 Size
 SpinlockSemaSize(void)
 {
-	return SpinlockSemas() * sizeof(PGSemaphore);
+	return NUM_SIMULATION_SEMAPHORES * sizeof(PGSemaphore);
 }
 
-#ifdef HAVE_SPINLOCKS
-
 /*
  * Report number of semaphores needed to support spinlocks.
  */
 int
 SpinlockSemas(void)
 {
-	return 0;
+	return NUM_SIMULATION_SEMAPHORES;
 }
-#else							/* !HAVE_SPINLOCKS */
 
-/*
- * No TAS, so spinlocks are implemented as PGSemaphores.
- */
-
-
-/*
- * Report number of semaphores needed to support spinlocks.
- */
-int
-SpinlockSemas(void)
-{
-	return NUM_SPINLOCK_SEMAPHORES + NUM_ATOMICS_SEMAPHORES;
-}
+#ifndef HAVE_SPINLOCKS
 
 /*
  * Initialize spinlock emulation.
@@ -92,23 +93,59 @@ SpinlockSemaInit(void)
 /*
  * s_lock.h hardware-spinlock emulation using semaphores
  *
- * We map all spinlocks onto a set of NUM_SPINLOCK_SEMAPHORES semaphores.
- * It's okay to map multiple spinlocks onto one semaphore because no process
- * should ever hold more than one at a time.  We just need enough semaphores
- * so that we aren't adding too much extra contention from that.
+ * We map all spinlocks onto NUM_SIMULATION_SEMAPHORES semaphores.  It's okay to
+ * map multiple spinlocks onto one semaphore because no process should ever
+ * hold more than one at a time.  We just need enough semaphores so that we
+ * aren't adding too much extra contention from that.
+ *
+ * There is one exception to the restriction of only holding one spinlock at a
+ * time, which is that it's ok if emulated atomic operations are nested inside
+ * spinlocks. To avoid the danger of spinlocks and atomic using the same sema,
+ * we make sure "normal" spinlocks and atomics backed by spinlocks use
+ * distinct semaphores (see the nested argument to s_init_lock_sema).
  *
  * slock_t is just an int for this implementation; it holds the spinlock
- * number from 1..NUM_SPINLOCK_SEMAPHORES.  We intentionally ensure that 0
+ * number from 1..NUM_SIMULATION_SEMAPHORES.  We intentionally ensure that 0
  * is not a valid value, so that testing with this code can help find
  * failures to initialize spinlocks.
  */
 
+static inline void
+s_check_valid(int lockndx)
+{
+	if (unlikely(lockndx <= 0 || lockndx > NUM_SIMULATION_SEMAPHORES))
+		elog(ERROR, "invalid spinlock number: %d", lockndx);
+}
+
 void
 s_init_lock_sema(volatile slock_t *lock, bool nested)
 {
 	static uint32 counter = 0;
+	uint32		offset;
+	uint32		sema_total;
+	uint32		idx;
 
-	*lock = ((++counter) % NUM_SPINLOCK_SEMAPHORES) + 1;
+	if (nested)
+	{
+		/*
+		 * To allow nesting atomics inside spinlocked sections, use a
+		 * different spinlock. See comment above.
+		 */
+		offset = 1 + NUM_SPINLOCK_SEMAPHORES;
+		sema_total = NUM_ATOMICS_SEMAPHORES;
+	}
+	else
+	{
+		offset = 1;
+		sema_total = NUM_SPINLOCK_SEMAPHORES;
+	}
+
+	idx = (counter++ % sema_total) + offset;
+
+	/* double check we did things correctly */
+	s_check_valid(idx);
+
+	*lock = idx;
 }
 
 void
@@ -116,8 +153,8 @@ s_unlock_sema(volatile slock_t *lock)
 {
 	int			lockndx = *lock;
 
-	if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
-		elog(ERROR, "invalid spinlock number: %d", lockndx);
+	s_check_valid(lockndx);
+
 	PGSemaphoreUnlock(SpinlockSemaArray[lockndx - 1]);
 }
 
@@ -134,8 +171,8 @@ tas_sema(volatile slock_t *lock)
 {
 	int			lockndx = *lock;
 
-	if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
-		elog(ERROR, "invalid spinlock number: %d", lockndx);
+	s_check_valid(lockndx);
+
 	/* Note that TAS macros return 0 if *success* */
 	return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
 }
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index a48f9de2532..231aab9d569 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -794,6 +794,47 @@ test_atomic_uint64(void)
 	EXPECT_EQ_U64(pg_atomic_fetch_and_u64(&var, ~0), 0);
 }
 
+/*
+ * Verify that performing atomic ops inside a spinlock isn't a
+ * problem. Realistically that's only going to be a problem when both
+ * --disable-spinlocks and --disable-atomics are used, but it's cheap enough
+ * to just always test.
+ *
+ * The test works by initializing enough atomics that we'd conflict if there
+ * were an overlap between a spinlock and an atomic by holding a spinlock
+ * while manipulating more than NUM_SPINLOCK_SEMAPHORES atomics.
+ *
+ * NUM_TEST_ATOMICS doesn't really need to be more than
+ * NUM_SPINLOCK_SEMAPHORES, but it seems better to test a bit more
+ * extensively.
+ */
+static void
+test_atomic_spin_nest(void)
+{
+	slock_t lock;
+#define NUM_TEST_ATOMICS (NUM_SPINLOCK_SEMAPHORES + NUM_ATOMICS_SEMAPHORES + 27)
+	pg_atomic_uint32 atomics[NUM_TEST_ATOMICS];
+
+	SpinLockInit(&lock);
+
+	for (int i = 0; i < NUM_TEST_ATOMICS; i++)
+		pg_atomic_init_u32(&atomics[i], 0);
+
+	/* just so it's not all zeroes */
+	for (int i = 0; i < NUM_TEST_ATOMICS; i++)
+		EXPECT_EQ_U32(pg_atomic_fetch_add_u32(&atomics[i], i), 0);
+
+	/* test whether we can do atomic op with lock held */
+	SpinLockAcquire(&lock);
+	for (int i = 0; i < NUM_TEST_ATOMICS; i++)
+	{
+		EXPECT_EQ_U32(pg_atomic_fetch_sub_u32(&atomics[i], i), i);
+		EXPECT_EQ_U32(pg_atomic_read_u32(&atomics[i]), 0);
+	}
+	SpinLockRelease(&lock);
+}
+#undef NUM_TEST_ATOMICS
+
 static void
 test_spinlock(void)
 {
@@ -891,6 +932,8 @@ test_atomic_ops(PG_FUNCTION_ARGS)
 
 	test_atomic_uint64();
 
+	test_atomic_spin_nest();
+
 	/* XXX: Is there a better location for this? */
 	test_spinlock();
 
-- 
2.25.0.114.g5b0ca878e0

Reply via email to