The following kernel patches fixes it for me, will send to lkml:

diff --git a/debian.master/changelog b/debian.master/changelog
index f8f7a35a..081e666 100644
--- a/debian.master/changelog
+++ b/debian.master/changelog
@@ -1,3 +1,9 @@
+linux (3.11.0-4.9debug1) saucy; urgency=low
+
+  * debug 1
+
+ -- Serge Hallyn <serge@tangerine.buildd>  Thu, 29 Aug 2013 13:34:43 +0000
+
 linux (3.11.0-4.9) saucy; urgency=low
 
   [ Tim Gardner ]
diff --git a/debian/rules b/debian/rules
index 2d3358b..f87f26c 100755
--- a/debian/rules
+++ b/debian/rules
@@ -13,6 +13,8 @@ DEBIAN=$(shell awk -F= '($$1 == "DEBIAN") { print $$2 }' 
<debian/debian.env)
 # with the kernel build.
 unexport CFLAGS
 unexport LDFLAGS
+export skipmodules=true
+export skipabi=true
 
 export LC_ALL=C
 export SHELL=/bin/bash -e
diff --git a/debian/scripts/module-check b/debian/scripts/module-check
index c754ea3..280b6e9 100755
--- a/debian/scripts/module-check
+++ b/debian/scripts/module-check
@@ -4,6 +4,7 @@ $flavour = shift;
 $prev_abidir = shift;
 $abidir = shift;
 $skipmodule = shift;
+$skipmodule = 1;
 
 print "II: Checking modules for $flavour...";
 
diff --git a/kernel/pid.c b/kernel/pid.c
index 66505c1..3cccab3 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -275,6 +275,10 @@ void free_pid(struct pid *pid)
                case 0:
                        schedule_work(&ns->proc_work);
                        break;
+               default:
+                       if (ns->child_reaper->flags & PF_EXITING)
+                               wake_up_process(ns->child_reaper);
+                       break;
                }
        }
        spin_unlock_irqrestore(&pidmap_lock, flags);

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1168526

Title:
  race condition causing lxc to not detect container init process exit

Status in “linux” package in Ubuntu:
  Confirmed
Status in “lxc” package in Ubuntu:
  Fix Committed

Bug description:
  For the purpose of the repro, my lxc init process is node.js v0.11.0
  (built from source) with a single line:

  process.exit(0);

  When running it in lxc, sometimes lxc doesn't exit. lxc-start remains
  a parent of a defunct node process without reaping it or exiting.

  I've made a custom build of lxc 0.9.0 to extract more information
  about this, adding only an INFO line, as follows:

  start.c:

          if (ret != sizeof(siginfo)) {
                  ERROR("unexpected siginfo size");
                  return -1;
          }
  +        INFO("got signal %d from pid %d while expecting SIGCHLD(17) from pid 
%d | uid = %d, status = %d", siginfo.ssi_signo, siginfo.ssi_pid, *pid, 
siginfo.ssi_uid, siginfo.ssi_status);

          if (siginfo.ssi_signo != SIGCHLD) {
                  kill(*pid, siginfo.ssi_signo);
                  INFO("forwarded signal %d to pid %d", siginfo.ssi_signo, 
*pid);
                  return 0;
          }

  I've tried this with a 3 official kernels. There is one difference in
  output.

  Kernels 3.7.9, 3.8.6:

  Successful case:

        lxc-start 1365724008.446 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '19458'
        lxc-start 1365724008.446 INFO     lxc_console - no console will be used
        lxc-start 1365724008.446 INFO     lxc_start - got signal 17 from pid 
18165 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 1
        lxc-start 1365724008.446 WARN     lxc_start - invalid pid for SIGCHLD
        lxc-start 1365724038.306 INFO     lxc_start - got signal 17 from pid 
19458 while expecting SIGCHLD(17) from pid 19458 | uid = 0, status = 0
        lxc-start 1365724038.306 DEBUG    lxc_start - container init process 
exited

  Hanging case:

        lxc-start 1365795195.358 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '8650'
        lxc-start 1365795195.358 INFO     lxc_console - no console will be used
        lxc-start 1365795195.358 INFO     lxc_start - got signal 17 from pid 
8626 while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 1
        lxc-start 1365795195.358 WARN     lxc_start - invalid pid for SIGCHLD
        lxc-start 1365795333.347 INFO     lxc_start - got signal 2 from pid 0 
while expecting SIGCHLD(17) from pid 8650 | uid = 0, status = 0
        lxc-start 1365795333.347 INFO     lxc_start - forwarded signal 2 to pid 
8650

  Kernel 3.9.0-rc6:

  Successful case is the same, but the hanging case changes to just:

        lxc-start 1365794343.870 NOTICE   lxc_start - '/usr/local/bin/node' 
started with pid '3432'
        lxc-start 1365794343.870 INFO     lxc_console - no console will be used
        lxc-start 1365794343.870 INFO     lxc_start - got signal 17 from pid 
2851 while expecting SIGCHLD(17) from pid 3432 | uid = 0, status = 1
        lxc-start 1365794343.870 WARN     lxc_start - invalid pid for SIGCHLD

  ... without forwarding signal 2 (SIGINT).

  Notes:
  - I'm on Mint 14 Nadia with raring packages, if that helps.
  - In all cases, there is signal 17 (SIGCHLD) coming in to lxc-start, but it 
comes from a different pid and is ignored by lxc. Any idea what this could be? 
This process seems to have been cleaned up and no longer appears in ps aux.
  - The lxc-start process should be getting notified with a SIGCHLD from the 
child's pid when the child (init process) exits.
  - This could be a kernel bug, but it's probably something unique that lxc is 
doing to trigger it.
  - I've tried other init processes (node.js without the process.exit and a 
custom c++ app with a stdout write and exit 0), which greatly reduce the 
frequency of this happening.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1168526/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to