On Tue, Jan 23, 2018 at 10:36 AM, Robert Haas <robertmh...@gmail.com> wrote: > As Amit says, what remains is the case where fork() fails or the > worker dies before it reaches the line in ParallelWorkerMain that > reads shm_mq_set_sender(mq, MyProc). In those cases, no error will be > signaled until you call WaitForParallelWorkersToFinish(). If you wait > prior to that point for a number of workers equal to > nworkers_launched, you will wait forever in those cases.
Another option might be to actually call WaitForParallelWorkersToFinish() in place of a condition variable or barrier, as Amit suggested at one point. > I am going to repeat my previous suggest that we use a Barrier here. > Given the discussion subsequent to my original proposal, this can be a > lot simpler than what I suggested originally. Each worker does > BarrierAttach() before beginning to read tuples (exiting if the phase > returned is non-zero) and BarrierArriveAndDetach() when it's done > sorting. The leader does BarrierAttach() before launching workers and > BarrierArriveAndWait() when it's done sorting. If we don't do this, > we're going to have to invent some other mechanism to count the > participants that actually initialize successfully, but that seems > like it's just duplicating code. I think that this closes the door to leader non-participation as anything other than a developer-only debug option, which might be fine. If parallel_leader_participation=off (or some way of getting the same behavior through a #define) is to be retained, then an artificial wait is required as a substitute for the leader's participation as a worker. -- Peter Geoghegan