On Tue, Jan 23, 2018 at 8:43 AM, Peter Geoghegan <p...@bowt.ie> wrote: > On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >>> FWIW, I don't think that that's really much of a difference. >>> >>> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is >>> similar to how _bt_end_parallel() calls >>> WaitForParallelWorkersToFinish() in the patch. The >>> _bt_leader_heapscan() condition variable wait for workers that you >>> refer to is quite a bit like how gather_readnext() behaves. It >>> generally checks to make sure that all tuple queues are done. >>> gather_readnext() can wait for developments using WaitLatch(), to make >>> sure every tuple queue is visited, with all output reliably consumed. >>> >> >> The difference lies in the fact that in gather_readnext, we use tuple >> queue mechanism which has the capability to detect that the workers >> are stopped/exited whereas _bt_leader_heapscan doesn't have any such >> capability, so I think it will loop forever. > > _bt_leader_heapscan() can detect when workers exit early, at least in > the vast majority of cases. It can do this simply by processing > interrupts and automatically propagating any error -- nothing special > about that. It can also detect when workers have finished > successfully, because of course, that's the main reason for its > existence. What remains, exactly? >
Will it able to detect fork failure or if worker exits before attaching to error queue? I think you can once try it by forcing fork failure in do_start_bgworker and see the behavior of _bt_leader_heapscan. I could have tried and let you know the results, but the latest patch doesn't seem to apply cleanly. > I don't know that much about tuple queues, but from a quick read I > guess you might be talking about shm_mq_receive() + > shm_mq_wait_internal(). It's not obvious that that will work in all > cases ("Note that if handle == NULL, and the process fails to attach, > we'll potentially get stuck here forever"). Also, I don't see how this > addresses the parallel_leader_participation issue I raised. > I am talking about shm_mq_receive->shm_mq_counterparty_gone. In shm_mq_counterparty_gone, it can detect if the worker is gone by using GetBackgroundWorkerPid. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com