On Fri, Aug 12, 2016 at 9:47 AM, Robert Haas <robertmh...@gmail.com> wrote: > https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2 > > Basically, a condition variable has three operations: you can wait for > the condition variable; you can signal the condition variable to wake > up one waiter; or you can broadcast on the condition variable to wake > up all waiters. Atomically with entering the wait, you must be able > to check whether the condition is satisfied. So, in my > implementation, a condition variable wait loop looks like this: > > for (;;) > { > ConditionVariablePrepareToSleep(cv); > if (condition for which we are waiting is satisfied) > break; > ConditionVariableSleep(); > } > ConditionVariableCancelSleep(); > > To wake up one waiter, another backend can call > ConditionVariableSignal(cv); to wake up all waiters, > ConditionVariableBroadcast(cv).
It is interesting to compare this interface with Wikipedia's description, POSIX's pthread_cond_t and C++'s std::condition_variable. In those interfaces, the wait operation takes a mutex which must already be held by the caller. It unlocks the mutex and begins waiting atomically. Then when it returns, the mutex is automatically reacquired. This approach avoids race conditions as long as the shared state change you are awaiting is protected by that mutex. If you check that state before waiting and while still holding the lock, you can be sure not to miss any change signals, and then when it returns you can check the state again and be sure that no one can be concurrently changing it. In contrast, this proposal leaves it up to client code to get that right, similarly to the way you need to do things in a certain order when waiting for state changes with latches. You could say that it's more error prone: I think there have been a few cases of incorrectly coded latch/state-change wait loops in the past. On the other hand, it places no requirements on the synchronisation mechanism the client code uses for the related shared state. pthread_cond_wait requires you to pass in a pointer to the related pthread_mutex_t, whereas with this proposal client code is free to use atomic ops, lwlocks, spinlocks or any other mutual exclusion mechanism to coordinate state changes and deal with cache coherency. Then there is the question of what happens when the backend that is supposed to be doing the signalling dies or aborts, which Tom Lane referred to in his reply. In those other libraries there is no such concern: it's understood that these are low level thread synchronisation primitives and if you're waiting for something that never happens, you'll be waiting forever. I don't know what the answer is in general for Postgres condition variables, but... The thing that I personally am working on currently that is very closely related and could use this has a more specific set of circumstances: I want "join points" AKA barriers. Something like pthread_barrier_t. (I'm saying "join point" rather than "barrier" to avoid confusion with compiler and memory barriers, barrier.h etc.) Join points let you wait for all workers in a known set to reach a given point, possibly with a phase number or at least sense (one bit phase counter) to detect synchronisation bugs. They also select one worker arbitrarily to receive a different return value when releasing workers from a join point, for cases where a particular phase of parallel work needs to be done by exactly one worker while the others sit on the bench: for example initialisation, cleanup or merging (CF PTHREAD_BARRIER_SERIAL_THREAD). Clearly a join point could be not much more than a condition variable and some state tracking arrivals and departures, but I think that this higher level synchronisation primitive might have an advantage over raw condition variables in the abort case: it can know the total set of workers that its waiting for, if they are somehow registered with it first, and registration can include arranging for cleanup hooks to do the right thing. It's already a requirement for a join point to know which workers exist (or at least how many). Then the deal would then be that when you call joinpoint_join(&some_joinpoint, phase), it will return only when all peers have joined or detached, where the latter happens automatically if they abort or die. Not at all sure of the details yet... but I suspect join points are useful for a bunch of things like parallel sort, parallel hash join (my project), and anything else involving phases or some form of "fork/join" parallelism. Or perhaps that type of thinking about error handling should be pushed down to the condition variable. How would that look: all potential signallers would have to register to deliver a goodbye signal in their abort and shmem exit paths? Then what happens if you die before registering? I think even if you find a way to do that I'd still need to do similar extra work on top for my join points concept, because although I do need waiters to be poked at the time worker aborts or dies, one goodbye prod isn't enough: I'd also need to adjust the join point's set of workers, or put it into error state. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers