Re: [HACKERS] parallel.c oblivion of worker-startup failures

Peter Geoghegan Wed, 24 Jan 2018 14:53:53 -0800

On Wed, Jan 24, 2018 at 12:05 PM, Robert Haas <robertmh...@gmail.com> wrote:
> In Thomas's test case, he's using 4 workers, and if even one of those
> manages to start, then it'll probably do so after any fork failures
> have already occurred, and the error will be noticed when that process
> sends its first message to the leader through the error queue, because
> it'll send PROCSIG_PARALLEL_MESSAGE via all the queues.  If all of the
> workers fail to start, then that doesn't help.  But it still manages
> to detect the failure in that case because it reaches
> WaitForParallelWorkersToFinish, which we just patched.
>
> But how does that happen, anyway?  Shouldn't it get stuck waiting for
> the tuple queues to which nobody's attached yet?  The reason it
> doesn't is because
> ExecParallelCreateReaders() calls shm_mq_set_handle() for each queue,
> which causes the tuple queues to be regarded as detached if the worker
> fails to start.  A detached tuple queue, in general, is not an error
> condition: it just means that worker has no more tuples.


This explains all the confusion. Amit told me that using a tuple queue
made all the difference here. Even till, it seemed surprising that
we'd rely on that from such a long distance from within nodeGather.c.

> I guess that works, but it seems more like blind luck than good
> design.  Parallel CREATE INDEX fails to be as "lucky" as Gather
> because there's nothing in parallel CREATE INDEX that lets it skip
> waiting for a worker which doesn't start -- and in general it seems
> unacceptable to impose a coding requirement that all future parallel
> operations must fail to notice the difference between a worker that
> completed successfully and one that never ran at all.

+1.

> If we made the Gather node wait only for workers that actually reached
> the Gather -- either by using a Barrier or by some other technique --
> then this would be a lot less fragile.  And the same kind of technique
> would work for parallel CREATE INDEX.

The use of a barrier has problems of its own [1], though, of which one
is unique to the parallel_leader_participation=off case. I thought
that you yourself agreed with this [2] -- do you?

Another argument against using a barrier in this way is that it seems
like way too much mechanism to solve a simple problem. Why should a
client of parallel.h not be able to rely on nworkers_launched (perhaps
only after "verifying it can be trusted")? That seem like a pretty
reasonable requirement for clients to have for any framework for
parallel imperative programming.

I think that we should implement "some other technique", instead of
using a barrier. As I've said, Amit's WaitForParallelWorkersToAttach()
idea seems like a promising "other technique".

[1] 
https://www.postgresql.org/message-id/caa4ek1+a0of4m231vbgpr_0ygg_bnmrgzlib7wqde-fybsy...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA+TgmoaF8UA8v8hP=ccoquc50pucpc8abj-_yyc++ygggjw...@mail.gmail.com
-- 
Peter Geoghegan

Re: [HACKERS] parallel.c oblivion of worker-startup failures

Reply via email to