The following kernel patches fixes it for me, will send to lkml: diff --git a/debian.master/changelog b/debian.master/changelog index f8f7a35a..081e666 100644 --- a/debian.master/changelog +++ b/debian.master/changelog @@ -1,3 +1,9 @@ +linux (3.11.0-4.9debug1) saucy; urgency=low + + * debug 1 + + -- Serge Hallyn <serge@tangerine.buildd> Thu, 29 Aug 2013 13:34:43 +0000 + linux (3.11.0-4.9) saucy; urgency=low [ Tim Gardner ] diff --git a/debian/rules b/debian/rules index 2d3358b..f87f26c 100755 --- a/debian/rules +++ b/debian/rules @@ -13,6 +13,8 @@ DEBIAN=$(shell awk -F= '($$1 == "DEBIAN") { print $$2 }' <debian/debian.env) # with the kernel build. unexport CFLAGS unexport LDFLAGS +export skipmodules=true +export skipabi=true export LC_ALL=C export SHELL=/bin/bash -e diff --git a/debian/scripts/module-check b/debian/scripts/module-check index c754ea3..280b6e9 100755 --- a/debian/scripts/module-check +++ b/debian/scripts/module-check @@ -4,6 +4,7 @@ $flavour = shift; $prev_abidir = shift; $abidir = shift; $skipmodule = shift; +$skipmodule = 1; print "II: Checking modules for $flavour..."; diff --git a/kernel/pid.c b/kernel/pid.c index 66505c1..3cccab3 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -275,6 +275,10 @@ void free_pid(struct pid *pid) case 0: schedule_work(&ns->proc_work); break; + default: + if (ns->child_reaper->flags & PF_EXITING) + wake_up_process(ns->child_reaper); + break; } } spin_unlock_irqrestore(&pidmap_lock, flags);
-- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1168526 Title: race condition causing lxc to not detect container init process exit Status in “linux” package in Ubuntu: Confirmed Status in “lxc” package in Ubuntu: Fix Committed Bug description: For the purpose of the repro, my lxc init process is node.js v0.11.0 (built from source) with a single line: process.exit(0); When running it in lxc, sometimes lxc doesn't exit. lxc-start remains a parent of a defunct node process without reaping it or exiting. I've made a custom build of lxc 0.9.0 to extract more information about this, adding only an INFO line, as follows: start.c: if (ret != sizeof(siginfo)) { ERROR("unexpected siginfo size"); return -1; } + INFO("got signal %d from pid %d while expecting SIGCHLD(17) from pid %d | uid = %d, status = %d", siginfo.ssi_signo, siginfo.ssi_pid, *pid, siginfo.ssi_uid, siginfo.ssi_status); if (siginfo.ssi_signo != SIGCHLD) { kill(*pid, siginfo.ssi_signo); INFO("forwarded signal %d to pid %d", siginfo.ssi_signo, *pid); return 0; } I've tried this with a 3 official kernels. There is one difference in output. Kernels 3.7.9, 3.8.6: Successful case: lxc-start 1365724008.446 NOTICE lxc_start - '/usr/local/bin/node' started with pid '19458' lxc-start 1365724008.446 INFO lxc_console - no console will be used lxc-start 1365724008.446 INFO lxc_start - got signal 17 from pid 18165 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 1 lxc-start 1365724008.446 WARN lxc_start - invalid pid for SIGCHLD lxc-start 1365724038.306 INFO lxc_start - got signal 17 from pid 19458 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 0 lxc-start 1365724038.306 DEBUG lxc_start - container init process exited Hanging case: lxc-start 1365795195.358 NOTICE lxc_start - '/usr/local/bin/node' started with pid '8650' lxc-start 1365795195.358 INFO lxc_console - no console will be used lxc-start 1365795195.358 INFO lxc_start - got signal 17 from pid 8626 while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 1 lxc-start 1365795195.358 WARN lxc_start - invalid pid for SIGCHLD lxc-start 1365795333.347 INFO lxc_start - got signal 2 from pid 0 while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 0 lxc-start 1365795333.347 INFO lxc_start - forwarded signal 2 to pid 8650 Kernel 3.9.0-rc6: Successful case is the same, but the hanging case changes to just: lxc-start 1365794343.870 NOTICE lxc_start - '/usr/local/bin/node' started with pid '3432' lxc-start 1365794343.870 INFO lxc_console - no console will be used lxc-start 1365794343.870 INFO lxc_start - got signal 17 from pid 2851 while expecting SIGCHLD(17) from pid 3432 | uid = 0, status = 1 lxc-start 1365794343.870 WARN lxc_start - invalid pid for SIGCHLD ... without forwarding signal 2 (SIGINT). Notes: - I'm on Mint 14 Nadia with raring packages, if that helps. - In all cases, there is signal 17 (SIGCHLD) coming in to lxc-start, but it comes from a different pid and is ignored by lxc. Any idea what this could be? This process seems to have been cleaned up and no longer appears in ps aux. - The lxc-start process should be getting notified with a SIGCHLD from the child's pid when the child (init process) exits. - This could be a kernel bug, but it's probably something unique that lxc is doing to trigger it. - I've tried other init processes (node.js without the process.exit and a custom c++ app with a stdout write and exit 0), which greatly reduce the frequency of this happening. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168526/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp