On Wed, Jan 24, 2018 at 2:35 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Tue, Jan 23, 2018 at 11:15 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Mon, Jan 22, 2018 at 10:56 PM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> >> OK. I've committed this patch and back-patched it to 9.6. >> Back-patching to 9.5 didn't looks simple because nworkers_launched >> doesn't exist on that branch, and it doesn't matter very much anyway >> since the infrastructure has no in-core users in 9.5. > > I was just talking to Thomas, and one or the other of us realized that > this doesn't completely fix the problem. I think that if a worker > fails, even by some unorthodox mechanism like an abrupt proc_exit(1), > after the point where it attached to the error queue, this patch is > sufficient to make that reliably turn into an error. But what if it > fails before that - e.g. fork() failure? In that case, this process > will catch the problem if the calling process calls > WaitForParallelWorkersToFinish, but does that actually happen in any > interesting cases? Maybe not. > > In the case of Gather, for example, we won't > WaitForParallelWorkersToFinish() until we've read all the tuples. If > there's a tuple queue that does not yet have a sender, then we'll just > wait for it to get one, even if the sender fell victim to a fork > failure and is never going to show up. >
Hmm, I think that case will be addressed because tuple queues can detect if the leader is not attached. It does in code path shm_mq_receive->shm_mq_counterparty_gone. In shm_mq_counterparty_gone, it can detect if the worker is gone by using GetBackgroundWorkerPid. Moreover, I have manually tested this particular case before saying your patch is fine. Do you have some other case in mind which I am missing? > We could *almost* fix this by having execParallel.c include a Barrier > in the DSM, similar to what I proposed for parallel CREATE INDEX. > When a worker starts, it attaches to the barrier; when it exits, it > detaches. Instead of looping until the leader is done and all the > tuple queues are detached, Gather or Gather Merge could loop until the > leader is done and everyone detaches from the barrier. But that would > require changes to the Barrier API, which isn't prepared to have the > caller alternate waiting with other activity like reading the tuple > queues, which would clearly be necessary in this case. Moreover, it > doesn't work at all when parallel_leader_participation=off, because in > that case we'll again just wait forever for some worker to show up, > and if none of them do, then we're hosed. > > One idea to fix this is to add a new function in parallel.c with a > name like CheckForFailedWorkers() that Gather and Gather Merge call > just before they WaitLatch(). If a worker fails to start, the > postmaster will send SIGUSR1 to the leader, which will set the latch, > so we can count on the function being run after every worker death. > The function would go through and check each worker for which > pcxt->worker[i].error_mqh != NULL && !pcxt->any_message_received[i] to > see whether GetBackgroundWorkerPid(pcxt->worker[i].bgwhandle, &pid) == > BGWH_STOPPED. If so, ERROR. > > This lets us *run* for arbitrary periods of time without detecting a > fork failure, but before we *wait* we will notice. Getting more > aggressive than that sounds expensive, but I'm open to ideas. > > If we adopt this approach, then Peter's patch could probably make use > of it, too. It's a little tricky because ConditionVariableSleep() > tries not to return spuriously, so we'd either have to change that or > stick CheckForFailedWorkers() into that function's loop. I suspect > the former is a better approach. Or maybe that patch should still use > the Barrier approach and avoid all of this annoyance. > I think we can discuss your proposed solution once the problem is clear. As of now, it is not very clear to me whether there is any pending problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com