On Wed, Dec 6, 2017 at 8:02 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> Looks good to me. > > Committed and back-patched to 9.4 where dynamic background workers > were introduced. >
Thanks. >>> Barring objections, I'll commit and >>> back-patch this; then we can deal with the other part of this >>> afterward. >> >> Sure, I will rebase on top of the commit unless you have some comments >> on the remaining part. > > I'm not in love with that part of the fix; the other parts of that if > statement are just testing variables, and a function call that takes > and releases an LWLock is a lot more expensive. Furthermore, that > test will often be hit in practice, because we'll often arrive at this > function before workers have actually finished. On top of that, we'll > typically arrive here having already communicated with the worker in > some way, such as by receiving tuples, which means that in most cases > we already know that the worker was alive at least at some point, and > therefore the extra test isn't necessary. We only need that test, if > I understand correctly, to cover the failure-to-launch case, which is > presumably very rare. > > All that having been said, I don't quite know how to do it better. It > would be easy enough to modify this function so that if it ever > received a result other then BGWH_NOT_YET_STARTED for a worker, it > wouldn't call this function again for that worker. However, that's > only mitigating the problem, not really fixing it. We'll still end up > frequently inquiring - admittedly only once - about the status of a > worker with which we've previously communicated via the tuple queue. > Maybe we just have to live with that problem, but do you (or does > anyone else) have an idea? > I think the optimization you are suggesting has a merit over what I have done in the patch. However, one point to note is that we are anyway going to call that function down in the path( WaitForParallelWorkersToExit->WaitForBackgroundWorkerShutdown->GetBackgroundWorkerPid), so we might want to take the advantage of calling it first time as well. We can actually cache the status of workers that have returned BGWH_STOPPED and use it later so that we don't need to make this function call again for workers which are already stopped. We can even do what Tom is suggesting to avoid calling it for workers which are known to be launched, but I am really not sure if that function call is costly enough that we need to maintain one extra state to avoid that. While looking into this code path, I wonder how much we are gaining by having two separate calls (WaitForParallelWorkersToFinish and WaitForParallelWorkersToExit) to check the status of workers after finishing the parallel execution? They are used together in rescan code path, so apparently there is no benefit calling them separately there. OTOH, during shutdown of nodes, it will often be the case that they will be called in a short duration after each other except for the case where we need to shut down the node for the early exit like when there is a limit clause or such. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com