Hi hackers, While working on wait event related stuff I observed a failed assertion:
" TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: "../../../../src/include/storage/proclist.h", Line: 91 " during pg_regress/database. To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever would trigger ConditionVariableBroadcast() in pgstat_report_wait_end(): And launch: postgres=# CREATE TABLESPACE regress_tblspace LOCATION '/home/postgres/bdttbs'; CREATE TABLESPACE postgres=# create database regression_bdt9; CREATE DATABASE postgres=# ALTER DATABASE regression_bdt9 SET TABLESPACE regress_tblspace; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Succeeded. The issue is that in ConditionVariableTimedSleep() during the WaitLatch() we’ll also trigger ProcessInterrupts() that will call ConditionVariableCancelSleep() that will: - remove the process from the wait list - and also set cv_sleep_target to NULL. Then in ConditionVariableTimedSleep(), we’ll go through: “ /* * If this process has been taken out of the wait list, then we know * that it has been signaled by ConditionVariableSignal (or * ConditionVariableBroadcast), so we should return to the caller. But * that doesn't guarantee that the exit condition is met, only that we * ought to check it. So we must put the process back into the wait * list, to ensure we don't miss any additional wakeup occurring while * the caller checks its exit condition. We can take ourselves out of * the wait list only when the caller calls * ConditionVariableCancelSleep. * * If we're still in the wait list, then the latch must have been set * by something other than ConditionVariableSignal; though we don't * guarantee not to return spuriously, we'll avoid this obvious case. */ SpinLockAcquire(lock: &cv->mutex); if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink)) { done = true; proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink); } SpinLockRelease(&cv->mutex); “ Means we re-add the process in the wait list. The issue is that cv_sleep_target is still NULL so that we’ll exit ConditionVariableTimedSleep() due to: " if (cv != cv_sleep_target) done = true; /* We were signaled, so return */ if (done) return false; " Later we’ll want to add our process to a wait list, calling ConditionVariablePrepareToSleep() that will not call ConditionVariableCancelSleep() due to: “ if (cv_sleep_target != NULL) ConditionVariableCancelSleep(); “ As cv_sleep_target is still NULL. Finally calling proclist_push_tail() that will trigger the failed assertion. The proposed fix attached is done in ConditionVariableTimedSleep() as this is the place that introduces the race condition. It re-assigns cv_sleep_target to cv and then ensures that cv_sleep_target accurately describes which condition variable we’re prepared to wait on. Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From 7823cfe0328883cec4179d5b024bbeb99c270f75 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Mon, 5 May 2025 06:57:29 +0000 Subject: [PATCH v1] Fix a race condition in ConditionVariableTimedSleep() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A process could exit ConditionVariableTimedSleep() after detecting a state change (cv != cv_sleep_target) without being removed from the list. When ConditionVariableTimedSleep() detects that the process is no longer in the wait list (could be signaled by ConditionVariableSignal() or ConditionVariableBroadcast()), it re-adds itself to ensure that we don't miss any additional wakeup. The issue is that it could leave ConditionVariableTimedSleep() with cv_sleep_target set to NULL and the process re-added in the wait list. If cv_sleep_target is set to NULL then subsequent calls to ConditionVariableCancelSleep() will return early without removing the process from any list. This could to an assertion failure when the process later tries to wait on a condition variable, as ConditionVariablePrepareToSleep() skips cleanup when cv_sleep_target is NULL, resulting in a process trying to be in two wait lists simultaneously. The fix re-assigns cv_sleep_target to cv and then ensures that cv_sleep_target accurately describes which condition variable we’re prepared to wait on. --- src/backend/storage/lmgr/condition_variable.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c index 228303e77f7..198c1988919 100644 --- a/src/backend/storage/lmgr/condition_variable.c +++ b/src/backend/storage/lmgr/condition_variable.c @@ -184,6 +184,8 @@ ConditionVariableTimedSleep(ConditionVariable *cv, long timeout, if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink)) { done = true; + if (cv_sleep_target == NULL) + cv_sleep_target = cv; proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink); } SpinLockRelease(&cv->mutex); -- 2.34.1