Re: [HACKERS] parallel.c oblivion of worker-startup failures

Robert Haas Wed, 06 Dec 2017 06:33:32 -0800

On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila <[email protected]> wrote:
> Looks good to me.


Committed and back-patched to 9.4 where dynamic background workers
were introduced.

>> Barring objections, I'll commit and
>> back-patch this; then we can deal with the other part of this
>> afterward.
>
> Sure, I will rebase on top of the commit unless you have some comments
> on the remaining part.

I'm not in love with that part of the fix; the other parts of that if
statement are just testing variables, and a function call that takes
and releases an LWLock is a lot more expensive.  Furthermore, that
test will often be hit in practice, because we'll often arrive at this
function before workers have actually finished.  On top of that, we'll
typically arrive here having already communicated with the worker in
some way, such as by receiving tuples, which means that in most cases
we already know that the worker was alive at least at some point, and
therefore the extra test isn't necessary.  We only need that test, if
I understand correctly, to cover the failure-to-launch case, which is
presumably very rare.

All that having been said, I don't quite know how to do it better.  It
would be easy enough to modify this function so that if it ever
received a result other then BGWH_NOT_YET_STARTED for a worker, it
wouldn't call this function again for that worker.  However, that's
only mitigating the problem, not really fixing it.  We'll still end up
frequently inquiring - admittedly only once - about the status of a
worker with which we've previously communicated via the tuple queue.
Maybe we just have to live with that problem, but do you (or does
anyone else) have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] parallel.c oblivion of worker-startup failures

Reply via email to