Hi, Some of my EnterpriseDB colleagues and I have been working on various parallel query projects, all of which have been previously disclosed here:
https://wiki.postgresql.org/wiki/EnterpriseDB_database_server_roadmap One issue we've encountered is that it's not very easy for one process in a group of cooperating parallel processes to wait for another process in that same group. One idea is to have one process grab an LWLock and other processes try to acquire it, but that actually doesn't work very well. A pretty obvious problem is that it holds of interrupts for the entire time that you are holding the lock, which is pretty undesirable. A more subtle problem is that it's easy to conceive of situations where the LWLock paradigm is just a very poor fit for what you actually want to do. For example, suppose you have a computation which proceeds in two phases: each backend that finishes phase 1 must wait until all backends finish phase 1, and once all have finished, all can begin phase 2. You could handle this case by having an LWLock which everyone holds during phase 1 in shared mode, and then everyone must briefly acquire it in exclusive mode before starting phase 2, but that's an awful hack. It also has race conditions: what if someone finishes phase 1 before everyone has started phase 1? And what if there are 10 phases instead of 2? Another approach to the problem is to use a latch wait loop. That almost works. Interrupts can be serviced, and you can recheck shared memory to see whether the condition for proceeding is satisfied after each iteration of the loop. There's only one problem: when you do something that might cause the condition to be satisfied for other waiting backends, you need to set their latch - but you don't have an easy way to know exactly which processes are waiting, so how do you call SetLatch? I originally thought of adding a function like SetAllLatches(ParallelContext *) and maybe that can work, but then I had what I think is a better idea, which is to introduce a notion of condition variables. Condition variables, of course, are a standard synchronization primitive: https://en.wikipedia.org/wiki/Monitor_(synchronization)#Condition_variables_2 Basically, a condition variable has three operations: you can wait for the condition variable; you can signal the condition variable to wake up one waiter; or you can broadcast on the condition variable to wake up all waiters. Atomically with entering the wait, you must be able to check whether the condition is satisfied. So, in my implementation, a condition variable wait loop looks like this: for (;;) { ConditionVariablePrepareToSleep(cv); if (condition for which we are waiting is satisfied) break; ConditionVariableSleep(); } ConditionVariableCancelSleep(); To wake up one waiter, another backend can call ConditionVariableSignal(cv); to wake up all waiters, ConditionVariableBroadcast(cv). I am cautiously optimistic that this design will serve a wide variety of needs for parallel query development - basically anything that needs to wait for another process to reach a certain point in the computation that can be detected through changes in shared memory state. The attached patch condition-variable-v1.patch implements this API. I originally open-coded the wait queue for this, but I've just finished rebasing it on top of Thomas Munro's proclist stuff, so before applying this patch you need the one from here: https://www.postgresql.org/message-id/CAEepm=0vvr9zgwht67rwutfwmeby1gigptbk3xfpdbbgetz...@mail.gmail.com At some point while hacking on this I realized that we could actually replace the io_in_progress locks with condition variables; the attached patch buffer-io-cv-v1.patch does this (it must be applied on top of the proclist patch from the above email and also on top of condition-variable-v1.patch). Using condition variables here seems to have a couple of advantages. First, it means that a backend waiting for buffer I/O to complete is interruptible. Second, it fixes a long-running bit of nastiness in AbortBufferIO: right now, if a backend that is doing buffer I/O aborts, the abort causes it to release all of its LWLocks, including the buffer I/O lock. Everyone waiting for that buffer busy-loops until the aborting process gets around to reacquiring the lock and updating the buffer state in AbortBufferIO. But if we replace the io_in_progress locks with condition variables, then that doesn't happen any more. Nobody is "holding" the condition variable, so it doesn't get "released" when the process doing I/O aborts. Instead, they just keep sleeping until the aborting process reaches AbortBufferIO, and then it broadcasts on the condition variable and wakes everybody up, which seems a good deal nicer. I'm very curious to know whether other people like this abstraction and whether they think it will be useful for things they want to do with parallel query (or otherwise). Comments welcome. Review appreciated. Other suggestions for how to handle this are cool, too. Credit: These patches were written by me; an earlier version of the condition-variable-v1.patch was reviewed and tested by Rahila Syed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
condition-variable-v1.patch
Description: application/download
buffer-io-cv-v1.patch
Description: application/download
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers