Detection of SIGCHLD from the container init by the monitor process which spawned it is done during lxc_poll. If the monitor is slow and the init (especially if using lxc-init to run /bin/true) exits quickly, it can send its SIGCHLD before lxc_poll starts. In that case lxc_poll ends up hanging forever waiting for the SIGCHLD, while the init process is a zombie waiting to be reaped.
If you want to verify that race, you can simply add a sleep(3) right after the lxc_set_state(... RUNNING) in lxc_spawn, then do sudo lxc-execute -n somecontainer -- true I suspect a clean way to handle this is using sigaction just between lxc_spawn and lxc_poll to set a flag if the SIGCHLD from container init is detected. However, I haven't yet coded that fix yet. If someone wants to do that before I get to it that would be great. In the meantime, another fix which works is below. On the one hand it feels ugly because it's not using signals. On the other hand, the sigaction method will be more complicated and less obvious... One thing I'm not sure of (and which will complicate either way) is whether it is the case that any signals delivered between the lxc_mainloop_add_handler() and the actual running of the lxc_mainloop() will be queued up and handled in the lxc_mainloop(). This should fix at least 2 bugs, https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1134923 https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1144873 and maybe a third https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1124526 Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> --- src/lxc/start.c | 34 +++++++++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/src/lxc/start.c b/src/lxc/start.c index f0e82a3..75d1fd6 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -220,6 +220,28 @@ static int setup_signal_fd(sigset_t *oldmask) return fd; } +static int is_zombie(int pid) +{ + char path[PATH_MAX]; + FILE *f; + int ret, junkint; + char state; + + ret = snprintf(path, PATH_MAX, "/proc/%d/stat", pid); + if (ret < 0 || ret >= PATH_MAX) + return 0; + f = fopen(path, "r"); + if (!f) // if it doesn't exist, process doesn't exist... + return 0; + ret = fscanf(f, "%d %s %c", &junkint, path, &state); + fclose(f); + if (ret != 3) + return 0; + if (state == 'Z') + return 1; + return 0; +} + static int signal_handler(int fd, void *data, struct lxc_epoll_descr *descr) { @@ -358,6 +380,7 @@ int lxc_poll(const char *name, struct lxc_handler *handler) int sigfd = handler->sigfd; int pid = handler->pid; struct lxc_epoll_descr descr; + int ret = -1; if (lxc_mainloop_open(&descr)) { ERROR("failed to create mainloop"); @@ -390,13 +413,22 @@ int lxc_poll(const char *name, struct lxc_handler *handler) #endif } + ret = 0; + /* more robustness, protect ourself from a SIGCHLD sent + * by a process different from the container init + */ + if (is_zombie(handler->pid)) { + INFO("init has already exited"); + goto out_mainloop_open; + } + return lxc_mainloop(&descr); out_mainloop_open: lxc_mainloop_close(&descr); out_sigfd: close(sigfd); - return -1; + return ret; } extern int lxc_caps_check(void); -- 1.8.1.2 ------------------------------------------------------------------------------ Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the endpoint security space. For insight on selecting the right partner to tackle endpoint security challenges, access the full report. http://p.sf.net/sfu/symantec-dev2dev _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel