Thanks Craig! On Fri, Oct 23, 2020 at 9:37 AM Craig Ringer <craig.rin...@enterprisedb.com> wrote: > > src/test/modules/test_shm_mq/worker.c appears to do the right thing the wrong > way - it has its own custom handler instead of using die() or > SignalHandlerForShutdownRequest(). >
handle_sigterm() and die() are similar except that die() has extra handling(below) for exiting immediately when waiting for input/command from the client. /* * If we're in single user mode, we want to quit immediately - we can't * rely on latches as they wouldn't work when stdin/stdout is a file. * Rather ugly, but it's unlikely to be worthwhile to invest much more * effort just for the benefit of single user mode. */ if (DoingCommandRead && whereToSendOutput != DestRemote) ProcessInterrupts(); Having this extra handling is correct for normal backends as they can connect directly to clients for reading input commands, the above if condition may become true and ProcessInterrupts() may be called to handle the SIGTERM immediately. Note that DoingCommandRead can never be true in bg workers as it's being set to true only in normal backend PostgresMain(). If we use die()(instead of custom SIGTERM handlers such as handle_sigterm()) in a bg worker/non-backend process, there are no chances that the above part of the code gets hit and the interrupts are processed immediately. And also here are the bg worker process that use die() instead of their own custom handlers: parallel workers, autoprewarm worker, autovacuum worker, logical replication launcher and apply worker, wal sender process I think we can also use die() instead of handle_sigterm() in test_shm_mq/worker.c. I attached a patch for this change. > > In contrast src/test/modules/worker_spi/worker_spi.c looks plain wrong. > Especially since it's quoted as an example of how to do things right. It > won't respond to SIGTERM at all while it's executing a query from its queue, > no matter how long that query takes or whether it blocks. It can inhibit even > postmaster shutdown as a result. > Postmaster sends SIGTERM to all children(backends and bgworkers) for both smart shutdown(wait for children to end their work, then shut down.) and fast shutdown(rollback active transactions and shutdown when they are gone.) For immediate shutdown SIGQUIT is sent to children.(see pmdie()). For each bg worker(so is for worker_spi.c), SignalHandlerForCrashExit() is set as SIGQUIT handler in InitPostmasterChild(), which does nothing but exits immediately. We can not use quickdie() here because as a bg worker we don't have to/can not send anything to client. We are good with SignalHandlerForCrashExit() as with all other bg workers. Yes, if having worker_spi_sigterm/SignalHandlerForShutdownRequest, sometimes(as explained above) the worker_spi worker may not respond to SIGTERM. I think we should be having die() as SIGTERM handler here (as with normal backend and parallel workers). Attaching another patch that has replaced custom SIGTERM handler worker_spi_sigterm() with die() and custom SIGHUP handler worker_spi_sighup() with standard SignalHandlerForConfigReload(). > > I was going to lob off a quick patch to fix this by making both use > quickdie() for SIGQUIT and die() for SIGTERM, but after reading for a bit I'm > no longer sure what the right choice even is. I'd welcome some opinions. > We can not use quickdie() here because as a bg worker we don't have to/can not send anything to client. We are good with SignalHandlerForCrashExit() as with all other bg workers. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
v1-0001-Use-die-instead-of-custom-signal-handler-in-test_shm_mq-worker.patch
Description: Binary data
v1-0001-Use-standard-SIGHUP-handler-and-SIGTERM-handler-die-in-worker_spi.patch
Description: Binary data