Hi! Ludovic Courtès <l...@gnu.org> skribis:
> Turns out that this happens when calling the ‘daemonize’ action on > ‘root’. I have a reproducer now and am investigating… Good news: this is fixed in Shepherd commit f4272d2f0f393d2aa3e9d76b36ab6aa5f2fc72c2! The root cause is inconsistent semantics when mixing epoll, signalfd, and fork, specifically this part from signalfd(2): epoll(7) semantics If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an epoll(7) instance, then epoll_wait(2) returns events only for signals sent to that process. In particular, if the process then uses fork(2) to create a child process, then the child will be able to read(2) sig‐ nals that are sent to it using the signalfd file descriptor, but epoll_wait(2) will not indicate that the signalfd file descriptor is ready. In this scenario, a possible workaround is that after the fork(2), the child process can close the signalfd file descriptor that it inherited from the parent process and then create another signalfd file descriptor and add it to the epoll instance. […] The C program below illustrates this behavior:
#include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <sys/signal.h> #include <sys/signalfd.h> #include <sys/epoll.h> int main () { int ep, sfd; sigset_t signals; sigemptyset (&signals); sigaddset (&signals, SIGINT); sigaddset (&signals, SIGHUP); sigprocmask (SIG_BLOCK, &signals, NULL); sfd = signalfd (-1, &signals, SFD_CLOEXEC); ep = epoll_create1 (EPOLL_CLOEXEC); struct epoll_event events = { .events = EPOLLIN | EPOLLONESHOT, .data = NULL }; epoll_ctl (ep, EPOLL_CTL_ADD, sfd, &events); epoll_wait (ep, &events, 1, 123); if (fork () == 0) { /* Quoth signalfd(2): If a process adds (via epoll_ctl(2)) a signalfd file descriptor to an epoll(7) instance, then epoll_wait(2) returns events only for signals sent to that process. In particular, if the process then uses fork(2) to create a child process, then the child will be able to read(2) sig‐ nals that are sent to it using the signalfd file descriptor, but epoll_wait(2) will not indicate that the signalfd file descriptor is ready. */ printf ("try this: kill -INT %i\n", getpid ()); while (1) { struct signalfd_siginfo info; if (epoll_wait (ep, &events, 1, 777) > 0) { read (sfd, &info, sizeof info); printf ("got signal %i!\n", info.ssi_signo); epoll_ctl (ep, EPOLL_CTL_MOD, sfd, &events); } } } return 0; }
Of course it took me a while to find out about this; I first looked at things individually and didn’t expect the mixture to behave inconsistently. Maxim, let me know if it works for you! Thanks, Ludo’.