On Tue, Sep 3, 2019 at 2:43 PM Tsunakawa, Takayuki <tsunakawa.ta...@jp.fujitsu.com> wrote: > From: Kyotaro Horiguchi [mailto:horikyota....@gmail.com] > > Since we are allowing OPs to use arbitrary command as > > archive_command, providing a replacement with non-standard signal > > handling for a specific command doesn't seem a general solution > > to me. Couldn't we have pg_system(a tentative name), which > > intercepts SIGQUIT then sends SIGINT to children? Might be need > > to resend SIGQUIT after some interval, though.. > > The same idea that you referred to as pg_system occurred to me, too. But I > wondered if the archiver process can get the pid of its child (shell? > archive_command?), while keeping the capabilities of system() (= the shell). > Even if we fork() and then system(), doesn't the OS send SIGQUIT to any > descendents of the archiver when postmaster sends SIGQUIT to the child > process group?
So, to recap what's happening here, we have a tree of processes like this: postmaster -> archiver -> sh -> cp [user-supplied archiving command] The archiver is a process group leader, because it called setsid(). The postmaster's signal_child() does kill(pid, ...) and also kill(-pid, ...), so the kernel sends SIGQUIT to archiver (twice), sh and cp. As for what they do with the signal, it depends on timing: 1. The archiver normally exits immediately in pgarch_exit(), but while system() is running, SIGQUIT and SIGINT are ignored (see POSIX). 2. sh normally uses SIGINT to break out of loops etc, but while it's waiting for a subprocess, it also ignores SIGQUIT and SIGINT (see POSIX). 3. cp inherits the default disposition and (unless it handles it specially) dumps core. I think the general idea here is that interactive shells and similar things want to ignore signals from users typing ^C (SIGINT) or ^\ (SIGQUIT) so they can affect just the thing that's actually running at this moment, not the tree of processes waiting. Yeah, I guess we could have our own pg_system() function that does roughly what system() does, namely fork(), then execl() in the child and waitpid() in the parent, but the child could begin a new process group with setsid() before running execl() (so that it no longer gets SIGQUIT with the postmaster signals the archiver), and the parent could record pg_system_child_pid when forking, and install a QUIT handler that does kill(-pg_system_child_pid, SIGTERM), as well as setting a flag that will cause its main loop to exit (but not before it has run waitpid()). With some carefully placed blocks and unblocks and ignores, to avoid races. That all sounds like a lot of work though, and it might be easier to just make an exception and use SIGTERM to shut down the archiver, as I think Tom was suggesting. Unfortunately we have the same problem elsewhere, where we use popen(). I just wrote a C program that does just "sleep(60)", ran it with COPY FROM PROGRAM, then sent SIGQUIT to the postmaster, and got a dumped core. -- Thomas Munro https://enterprisedb.com