Fix a race condition in ConditionVariableTimedSleep()

Bertrand Drouvot Mon, 05 May 2025 00:53:07 -0700

Hi hackers,

While working on wait event related stuff I observed a failed assertion:


"
TRAP: failed Assert("node->next == 0 && node->prev == 0"), File: 
"../../../../src/include/storage/proclist.h", Line: 91
"

during pg_regress/database.

To reproduce, add an ereport(LOG,..) or CHECK_FOR_INTERRUPTS() or whatever
would trigger ConditionVariableBroadcast() in pgstat_report_wait_end():

And launch:

postgres=# CREATE TABLESPACE regress_tblspace LOCATION '/home/postgres/bdttbs';
CREATE TABLESPACE
postgres=# create database regression_bdt9;
CREATE DATABASE
postgres=# ALTER DATABASE regression_bdt9 SET TABLESPACE regress_tblspace;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

The issue is that in ConditionVariableTimedSleep() during the WaitLatch() we’ll
also trigger ProcessInterrupts() that will call ConditionVariableCancelSleep()
that will:

- remove the process from the wait list
- and also set cv_sleep_target to NULL.

Then in ConditionVariableTimedSleep(), we’ll go through:

“
          /*
           * If this process has been taken out of the wait list, then we know
           * that it has been signaled by ConditionVariableSignal (or
           * ConditionVariableBroadcast), so we should return to the caller. But
           * that doesn't guarantee that the exit condition is met, only that we
           * ought to check it.  So we must put the process back into the wait
           * list, to ensure we don't miss any additional wakeup occurring while
           * the caller checks its exit condition.  We can take ourselves out of
           * the wait list only when the caller calls
           * ConditionVariableCancelSleep.
           *
           * If we're still in the wait list, then the latch must have been set
           * by something other than ConditionVariableSignal; though we don't
           * guarantee not to return spuriously, we'll avoid this obvious case.
           */
          SpinLockAcquire(lock: &cv->mutex);
          if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink))
          {
              done = true;
              proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink);
          }
          SpinLockRelease(&cv->mutex);
“

Means we re-add the process in the wait list. The issue is that cv_sleep_target
is still NULL so that we’ll exit ConditionVariableTimedSleep() due to:

"
          if (cv != cv_sleep_target)
              done = true;

          /* We were signaled, so return */
          if (done)
              return false;
"

Later we’ll want to add our process to a wait list, calling 
ConditionVariablePrepareToSleep()
that will not call ConditionVariableCancelSleep() due to:

“
      if (cv_sleep_target != NULL)
          ConditionVariableCancelSleep();
“

As cv_sleep_target is still NULL.
Finally calling proclist_push_tail() that will trigger the failed assertion.

The proposed fix attached is done in ConditionVariableTimedSleep() as this is 
the
place that introduces the race condition. It re-assigns cv_sleep_target to cv
and then ensures that cv_sleep_target accurately describes which condition
variable we’re prepared to wait on.

Looking forward to your feedback,

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

>From 7823cfe0328883cec4179d5b024bbeb99c270f75 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot <bertranddrouvot...@gmail.com>
Date: Mon, 5 May 2025 06:57:29 +0000
Subject: [PATCH v1] Fix a race condition in ConditionVariableTimedSleep()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A process could exit ConditionVariableTimedSleep() after detecting a state
change (cv != cv_sleep_target) without being removed from the list.

When ConditionVariableTimedSleep() detects that the process is no longer
in the wait list (could be signaled by ConditionVariableSignal() or
ConditionVariableBroadcast()), it re-adds itself to ensure that we don't miss
any additional wakeup.

The issue is that it could leave ConditionVariableTimedSleep() with cv_sleep_target
set to NULL and the process re-added in the wait list.

If cv_sleep_target is set to NULL then subsequent calls to ConditionVariableCancelSleep()
will return early without removing the process from any list.

This could to an assertion failure when the process later tries to wait on a
condition variable, as ConditionVariablePrepareToSleep() skips cleanup
when cv_sleep_target is NULL, resulting in a process trying to be in two wait
lists simultaneously.

The fix re-assigns cv_sleep_target to cv and then ensures that cv_sleep_target
accurately describes which condition variable we’re prepared to wait on.
---
 src/backend/storage/lmgr/condition_variable.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 228303e77f7..198c1988919 100644
--- a/src/backend/storage/lmgr/condition_variable.c
+++ b/src/backend/storage/lmgr/condition_variable.c
@@ -184,6 +184,8 @@ ConditionVariableTimedSleep(ConditionVariable *cv, long timeout,
 		if (!proclist_contains(&cv->wakeup, MyProcNumber, cvWaitLink))
 		{
 			done = true;
+			if (cv_sleep_target == NULL)
+				cv_sleep_target = cv;
 			proclist_push_tail(&cv->wakeup, MyProcNumber, cvWaitLink);
 		}
 		SpinLockRelease(&cv->mutex);
-- 
2.34.1

Fix a race condition in ConditionVariableTimedSleep()

Reply via email to