Hi, from src/backend/storage/lmgr/README:
""" Spinlocks. These are intended for *very* short-term locks. If a lock is to be held more than a few dozen instructions, or across any sort of kernel call (or even a call to a nontrivial subroutine), don't use a spinlock. Spinlocks are primarily used as infrastructure for lightweight locks. """ I totally agree with this and IIUC spin lock is usually used with the following functions. #define init_local_spin_delay(status) .. void perform_spin_delay(SpinDelayStatus *status); void finish_spin_delay(SpinDelayStatus *status); During the perform_spin_delay, we have the following codes: void perform_spin_delay(SpinDelayStatus *status) /* Block the process every spins_per_delay tries */ if (++(status->spins) >= spins_per_delay) { if (++(status->delays) > NUM_DELAYS) s_lock_stuck(status->file, status->line, status->func); the s_lock_stuck will PAINC the entire system. My question is if someone doesn't obey the rule by mistake (everyone can make mistake), shall we PANIC on a production environment? IMO I think it can be a WARNING on a production environment and be a stuck when 'ifdef USE_ASSERT_CHECKING'. People may think spin lock may consume too much CPU, but it is not true in the discussed scene since perform_spin_delay have pg_usleep in it, and the MAX_DELAY_USEC is 1 second and MIN_DELAY_USEC is 0.001s. I notice this issue actually because of the patch "Cache relation sizes?" from Thomas Munro [1], where the latest patch[2] still have the following code. + sr = smgr_alloc_sr(); <-- HERE a spin lock is hold + + /* Upgrade to exclusive lock so we can create a mapping. */ + LWLockAcquire(mapping_lock, LW_EXCLUSIVE); <-- HERE a complex operation is needed. it may take a long time. Our internal testing system found more misuses on our own PG version. I think a experienced engineer like Thomas can make this mistake and the patch was reviewed by 3 peoples, the bug is still there. It is not easy to say just don't do it. the attached code show the prototype in my mind. Any feedback is welcome. [1] https://www.postgresql.org/message-id/CA%2BhUKGJg%2BgqCs0dgo94L%3D1J9pDp5hKkotji9A05k2nhYQhF4%2Bw%40mail.gmail.com [2] https://www.postgresql.org/message-id/attachment/123659/v5-0001-WIP-Track-relation-sizes-in-shared-memory.patch -- Best Regards Andy Fan
>From 7d7fd0f0e9b13a290bfffaec0ad40773191155f2 Mon Sep 17 00:00:00 2001 From: "yizhi.fzh" <yizhi....@alibaba-inc.com> Date: Thu, 4 Jan 2024 14:33:37 +0800 Subject: [PATCH v1 1/1] Don't call s_lock_stuck in production environment In the current implementation, if a spin lock is misused, the s_lock_stuck in perform_spin_delay can cause the entire system to PANIC. In order to balance fault tolerance and the ability to detect incorrect usage, we can use WARNING to replace s_lock_stuck in the production environment, but still use s_lock_stuck in builds with USE_ASSERT_CHECKING. --- src/backend/storage/lmgr/s_lock.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/backend/storage/lmgr/s_lock.c b/src/backend/storage/lmgr/s_lock.c index 327ac64f7c..9446605122 100644 --- a/src/backend/storage/lmgr/s_lock.c +++ b/src/backend/storage/lmgr/s_lock.c @@ -67,6 +67,7 @@ slock_t dummy_spinlock; static int spins_per_delay = DEFAULT_SPINS_PER_DELAY; +#ifdef USE_ASSERT_CHECKING /* * s_lock_stuck() - complain about a stuck spinlock */ @@ -85,6 +86,7 @@ s_lock_stuck(const char *file, int line, const char *func) func, file, line); #endif } +#endif /* * s_lock(lock) - platform-independent portion of waiting for a spinlock. @@ -132,7 +134,14 @@ perform_spin_delay(SpinDelayStatus *status) if (++(status->spins) >= spins_per_delay) { if (++(status->delays) > NUM_DELAYS) + { +#ifdef USE_ASSERT_CHECKING s_lock_stuck(status->file, status->line, status->func); +#else + if (status->delays % NUM_DELAYS == 0) + elog(WARNING, "perform spin lock on %s %d times", status->func, status->delays); +#endif + } if (status->cur_delay == 0) /* first time to delay? */ status->cur_delay = MIN_DELAY_USEC; -- 2.34.1