On Mon, Jan 22, 2018 at 6:45 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> FWIW, I don't think that that's really much of a difference. >> >> ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is >> similar to how _bt_end_parallel() calls >> WaitForParallelWorkersToFinish() in the patch. The >> _bt_leader_heapscan() condition variable wait for workers that you >> refer to is quite a bit like how gather_readnext() behaves. It >> generally checks to make sure that all tuple queues are done. >> gather_readnext() can wait for developments using WaitLatch(), to make >> sure every tuple queue is visited, with all output reliably consumed. >> > > The difference lies in the fact that in gather_readnext, we use tuple > queue mechanism which has the capability to detect that the workers > are stopped/exited whereas _bt_leader_heapscan doesn't have any such > capability, so I think it will loop forever.
_bt_leader_heapscan() can detect when workers exit early, at least in the vast majority of cases. It can do this simply by processing interrupts and automatically propagating any error -- nothing special about that. It can also detect when workers have finished successfully, because of course, that's the main reason for its existence. What remains, exactly? I don't know that much about tuple queues, but from a quick read I guess you might be talking about shm_mq_receive() + shm_mq_wait_internal(). It's not obvious that that will work in all cases ("Note that if handle == NULL, and the process fails to attach, we'll potentially get stuck here forever"). Also, I don't see how this addresses the parallel_leader_participation issue I raised. -- Peter Geoghegan