the s_lock_stuck on perform_spin_delay

Andy Fan Wed, 03 Jan 2024 23:09:23 -0800

Hi,

from src/backend/storage/lmgr/README:


"""
Spinlocks.  These are intended for *very* short-term locks.  If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call (or even a call to a nontrivial subroutine), don't use a
spinlock. Spinlocks are primarily used as infrastructure for lightweight
locks.
"""

I totally agree with this and IIUC spin lock is usually used with the
following functions.

#define init_local_spin_delay(status) ..
void perform_spin_delay(SpinDelayStatus *status);
void finish_spin_delay(SpinDelayStatus *status);

During the perform_spin_delay, we have the following codes:

void
perform_spin_delay(SpinDelayStatus *status)

        /* Block the process every spins_per_delay tries */
        if (++(status->spins) >= spins_per_delay)
        {
                if (++(status->delays) > NUM_DELAYS)
                        s_lock_stuck(status->file, status->line, status->func);

the s_lock_stuck will PAINC the entire system.

My question is if someone doesn't obey the rule by mistake (everyone
can make mistake), shall we PANIC on a production environment? IMO I
think it can be a WARNING on a production environment and be a stuck
when 'ifdef USE_ASSERT_CHECKING'.

People may think spin lock may consume too much CPU, but it is not true
in the discussed scene since perform_spin_delay have pg_usleep in it,
and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s.

I notice this issue actually because of the patch "Cache relation
sizes?" from Thomas Munro [1], where the latest patch[2] still have the 
following code. 
+               sr = smgr_alloc_sr();  <-- HERE a spin lock is hold
+
+               /* Upgrade to exclusive lock so we can create a mapping. */
+               LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex
  operation is needed. it may take a long time.

Our internal testing system found more misuses on our own PG version.

I think a experienced engineer like Thomas can make this mistake and the
patch was reviewed by 3 peoples, the bug is still there. It is not easy
to say just don't do it. 

the attached code show the prototype in my mind. Any feedback is welcome. 

[1]
https://www.postgresql.org/message-id/CA%2BhUKGJg%2BgqCs0dgo94L%3D1J9pDp5hKkotji9A05k2nhYQhF4%2Bw%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch
  

-- 
Best Regards
Andy Fan

>From 7d7fd0f0e9b13a290bfffaec0ad40773191155f2 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" <yizhi....@alibaba-inc.com>
Date: Thu, 4 Jan 2024 14:33:37 +0800
Subject: [PATCH v1 1/1] Don't call s_lock_stuck in production environment

In the current implementation, if a spin lock is misused, the
s_lock_stuck in perform_spin_delay can cause the entire system to
PANIC. In order to balance fault tolerance and the ability to detect
incorrect usage, we can use WARNING to replace s_lock_stuck in the
production environment, but still use s_lock_stuck in builds with
USE_ASSERT_CHECKING.
---
 src/backend/storage/lmgr/s_lock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c
index 327ac64f7c..9446605122 100644
--- a/src/backend/storage/lmgr/s_lock.c
+++ b/src/backend/storage/lmgr/s_lock.c
@@ -67,6 +67,7 @@ slock_t		dummy_spinlock;
 static int	spins_per_delay = DEFAULT_SPINS_PER_DELAY;
 
 
+#ifdef USE_ASSERT_CHECKING
 /*
  * s_lock_stuck() - complain about a stuck spinlock
  */
@@ -85,6 +86,7 @@ s_lock_stuck(const char *file, int line, const char *func)
 		 func, file, line);
 #endif
 }
+#endif
 
 /*
  * s_lock(lock) - platform-independent portion of waiting for a spinlock.
@@ -132,7 +134,14 @@ perform_spin_delay(SpinDelayStatus *status)
 	if (++(status->spins) >= spins_per_delay)
 	{
 		if (++(status->delays) > NUM_DELAYS)
+		{
+#ifdef USE_ASSERT_CHECKING
 			s_lock_stuck(status->file, status->line, status->func);
+#else
+			if (status->delays % NUM_DELAYS == 0)
+				elog(WARNING, "perform spin lock on %s %d times", status->func, status->delays);
+#endif
+		}
 
 		if (status->cur_delay == 0) /* first time to delay? */
 			status->cur_delay = MIN_DELAY_USEC;
-- 
2.34.1

the s_lock_stuck on perform_spin_delay

Reply via email to