On Wed, Jan 24, 2018 at 12:05 PM, Robert Haas <robertmh...@gmail.com> wrote: > In Thomas's test case, he's using 4 workers, and if even one of those > manages to start, then it'll probably do so after any fork failures > have already occurred, and the error will be noticed when that process > sends its first message to the leader through the error queue, because > it'll send PROCSIG_PARALLEL_MESSAGE via all the queues. If all of the > workers fail to start, then that doesn't help. But it still manages > to detect the failure in that case because it reaches > WaitForParallelWorkersToFinish, which we just patched. > > But how does that happen, anyway? Shouldn't it get stuck waiting for > the tuple queues to which nobody's attached yet? The reason it > doesn't is because > ExecParallelCreateReaders() calls shm_mq_set_handle() for each queue, > which causes the tuple queues to be regarded as detached if the worker > fails to start. A detached tuple queue, in general, is not an error > condition: it just means that worker has no more tuples.
This explains all the confusion. Amit told me that using a tuple queue made all the difference here. Even till, it seemed surprising that we'd rely on that from such a long distance from within nodeGather.c. > I guess that works, but it seems more like blind luck than good > design. Parallel CREATE INDEX fails to be as "lucky" as Gather > because there's nothing in parallel CREATE INDEX that lets it skip > waiting for a worker which doesn't start -- and in general it seems > unacceptable to impose a coding requirement that all future parallel > operations must fail to notice the difference between a worker that > completed successfully and one that never ran at all. +1. > If we made the Gather node wait only for workers that actually reached > the Gather -- either by using a Barrier or by some other technique -- > then this would be a lot less fragile. And the same kind of technique > would work for parallel CREATE INDEX. The use of a barrier has problems of its own [1], though, of which one is unique to the parallel_leader_participation=off case. I thought that you yourself agreed with this [2] -- do you? Another argument against using a barrier in this way is that it seems like way too much mechanism to solve a simple problem. Why should a client of parallel.h not be able to rely on nworkers_launched (perhaps only after "verifying it can be trusted")? That seem like a pretty reasonable requirement for clients to have for any framework for parallel imperative programming. I think that we should implement "some other technique", instead of using a barrier. As I've said, Amit's WaitForParallelWorkersToAttach() idea seems like a promising "other technique". [1] https://www.postgresql.org/message-id/caa4ek1+a0of4m231vbgpr_0ygg_bnmrgzlib7wqde-fybsy...@mail.gmail.com [2] https://www.postgresql.org/message-id/CA+TgmoaF8UA8v8hP=ccoquc50pucpc8abj-_yyc++ygggjw...@mail.gmail.com -- Peter Geoghegan