On Thu, Dec 14, 2017 at 3:05 AM, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Dec 13, 2017 at 1:41 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > >> This also doesn't appear to be completely safe. If we add >> proc_exit(1) after attaching to error queue (say after >> pq_set_parallel_master) in the worker, then it will lead to *hang* as >> anyone_alive will still be false and as it will find that the sender >> is set for the error queue, it won't return any failure. Now, I think >> even if we check worker status (which will be stopped) and break after >> the new error condition, it won't work as it will still return zero >> rows in the case reported by you above. > > Hmm, there might still be a problem there. I was thinking that once > the leader attaches to the queue, we can rely on the leader reaching > "ERROR: lost connection to parallel worker" in HandleParallelMessages. > However, that might not work because nothing sets > ParallelMessagePending in that case. The worker will have detached > the queue via shm_mq_detach_callback, but the leader will only > discover the detach if it actually tries to read from that queue. >
I think it would have been much easier to fix this problem if we would have some way to differentiate whether the worker has stopped gracefully or not. Do you think it makes sense to introduce such a state in the background worker machinery? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com