On Fri, Feb 23, 2018 at 8:48 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > On Fri, Feb 23, 2018 at 3:29 PM, Amit Kapila <amit.kapil...@gmail.com> wrote: >> On Thu, Feb 22, 2018 at 10:35 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> On Thu, Feb 22, 2018 at 7:54 AM, Thomas Munro >>>> PS I noticed that for BecomeLockGroupMember() we say "If we can't >>>> join the lock group, the leader has gone away, so just exit quietly" >>>> but for various other similar things we spew errors (most commonly >>>> seen one being "ERROR: could not map dynamic shared memory segment"). >>>> Intentional? >>> >>> I suppose I thought that if we failed to map the dynamic shared memory >>> segment, it might be down to any one of several causes; whereas if we >>> fail to join the lock group, it must be because the leader has already >>> exited. There might be a flaw in that thinking, though. >>> >> >> By the way, in which case leader can exit early? As of now, we do >> wait for workers to end both before the query is finished or in error >> cases. > > create table foo as select generate_series(1, 10)::int a; > alter table foo set (parallel_workers = 2); > set parallel_setup_cost = 0; > set parallel_tuple_cost = 0; > select count(a / 0) from foo; > > That reliably gives me: > ERROR: division by zero [from leader] > ERROR: could not map dynamic shared memory segment [from workers] > > I thought this was coming from resource manager cleanup, but you're > right: that happens after we wait for all workers to finish. Perhaps > this is a race within DestroyParallelContext() itself: when it is > called by AtEOXact_Parallel() during an abort, it asks the postmaster > to SIGTERM the workers, then it immediately detaches from the DSM > segment, and then it waits for the worker to start up. >
I guess you mean to say worker waits to shutdown/exit. Why would it wait for startup at that stage? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com