On Mon, Jan 22, 2018 at 3:52 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > The difference is that nodeGather.c doesn't have any logic like the > one you have in _bt_leader_heapscan where the patch waits for each > worker to increment nparticipantsdone. For Gather node, we do such a > thing (wait for all workers to finish) by calling > WaitForParallelWorkersToFinish which will have the capability after > Robert's patch to detect if any worker is exited abnormally (fork > failure or failed before attaching to the error queue).
FWIW, I don't think that that's really much of a difference. ExecParallelFinish() calls WaitForParallelWorkersToFinish(), which is similar to how _bt_end_parallel() calls WaitForParallelWorkersToFinish() in the patch. The _bt_leader_heapscan() condition variable wait for workers that you refer to is quite a bit like how gather_readnext() behaves. It generally checks to make sure that all tuple queues are done. gather_readnext() can wait for developments using WaitLatch(), to make sure every tuple queue is visited, with all output reliably consumed. This doesn't look all that similar to _bt_leader_heapscan(), I suppose, but I think that that's only because it's normal for all output to become available all at once for nbtsort.c workers. The startup cost is close to or actually the same as the total cost, as it *always* is for sort nodes. -- Peter Geoghegan