On Fri, Aug 7, 2020 at 1:34 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Tue, Jul 28, 2020 at 5:35 AM Bharath Rupireddy > <bharath.rupireddyforpostg...@gmail.com> wrote: > > The v4 patch looks good to me. Hang is not seen, make check and make > > check-world passes. I moved this to the committer for further review > > in https://commitfest.postgresql.org/29/2636/. > > I don't think I agree with this approach. In particular, I don't > understand the rationale for unblocking only SIGUSR1. Above, Vignesh > says that he feels that unblocking only that signal would be the right > approach, but no reason is given. I have two reasons why I suspect > it's not the right approach. One, it doesn't seem to be what we do > elsewhere; the only existing cases where we have special handling for > particular signals are SIGQUIT and SIGPIPE, and those places have > comments explaining the reason why they are handled in a special way. >
We intend to unblock SIGQUIT before sigsetjmp() in places like bgwriter, checkpointer, walwriter and walreceiver, but we only call sigdelset(&BlockSig, SIGQUIT);, Without PG_SETMASK(&BlockSig);, seems like we are not actually unblocking SIQUIT and quickdie() will never get called in these processes if (sigsetjmp(local_sigjmp_buf, 1) != 0){....}, if postmaster sends a SIGQUIT while these processes are doing clean up tasks in sigsetjmp(), it will not be received, and the postmaster later sends SIGKLL to kill from below places. /* * If we already sent SIGQUIT to children and they are slow to shut * down, it's time to send them SIGKILL. This doesn't happen * normally, but under certain conditions backends can get stuck while * shutting down. This is a last measure to get them unwedged. * * Note we also do this during recovery from a process crash. */ if ((Shutdown >= ImmediateShutdown || (FatalError && !SendStop)) && AbortStartTime != 0 && (now - AbortStartTime) >= SIGKILL_CHILDREN_AFTER_SECS) { /* We were gentle with them before. Not anymore */ TerminateChildren(SIGKILL); /* reset flag so we don't SIGKILL again */ AbortStartTime = 0; } Shouldn't we call PG_SETMASK(&BlockSig); to make it effective? Am I missing anything here? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com