On Fri, Jan 26, 2018 at 11:30 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Thu, Jan 25, 2018 at 1:24 AM, Peter Geoghegan <p...@bowt.ie> wrote: >> On Tue, Jan 23, 2018 at 9:43 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> Right, but what if the worker dies due to something proc_exit(1) or >>> something like that before calling BarrierArriveAndWait. I think this >>> is part of the problem we have solved in >>> WaitForParallelWorkersToFinish such that if the worker exits abruptly >>> at any point due to some reason, the system should not hang. >> >> I have used Thomas' chaos-monkey-fork-process.patch to verify: >> >> 1. The problem of fork failure causing nbtsort.c to wait forever is a >> real problem. Sure enough, the coding pattern within >> _bt_leader_heapscan() can cause us to wait forever even with commit >> 2badb5afb89cd569500ef7c3b23c7a9d11718f2f, more or less as a >> consequence of the patch not using tuple queues (it uses the new >> tuplesort sharing thing instead). >> >> 2. Simply adding a single call to WaitForParallelWorkersToFinish() >> within _bt_leader_heapscan() before waiting on our condition variable >> fixes the problem -- errors are reliably propagated, and we never end >> up waiting forever. >> >> 3. This short term fix works just as well with >> parallel_leader_participation=off. >> >> At this point, my preferred solution is for someone to go implement >> Amit's WaitForParallelWorkersToAttach() idea [1] (Amit himself seems >> like the logical person for the job). >> > > I can implement it and share a prototype patch with you which you can > use to test parallel sort stuff. I would like to highlight the > difference which you will see with WaitForParallelWorkersToAttach as > compare to WaitForParallelWorkersToFinish() is that the former will > give you how many of nworkers_launched workers are actually launched > whereas latter gives an error if any of the expected workers is not > launched. I feel former is good and your proposed way of calling it > after the leader is done with its work has alleviated the minor > disadvantage of this API which is that we need for workers to startup. >
/we need for workers to startup./we need to wait for workers to startup. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com