Quoting Qiang Huang (h.huangqi...@huawei.com): > On 2013/5/24 20:49, Serge Hallyn wrote: > > Quoting Qiang Huang (h.huangqi...@huawei.com): > >> Hi, > >> > >> I found a tricky problem in LXC, once I made a mistake in config, set > >> > >> lxc.cgroup.cpuset.cpus = -1 > >> > >> ofcourse start would fail, but then "lxc-ls --active" showed the container > >> is active. > >> > >> error message is: > >> # lxc-start -n hq111 -f config_hq -l TRACE > >> lxc-start: Invalid argument - write /cgroup/lxc/hq111/cpuset.cpus : > >> Invalid argument > >> lxc-start: Error setting cpuset.cpus to -1 for lxc/hq111 > >> > >> lxc-start: failed to setup the cgroups for 'hq111' > >> lxc-start: failed to spawn 'hq111' > >> lxc-start: Device or resource busy - failed to remove cgroup > >> '/cgroup/lxc/hq111' > >> > >> > >> This is not hard to reproduce, just keep trying, not stable though. > >> Then I read through the code and figured recursive_rmdir() failed, rmdir() > >> return > >> -1 sometimes, any idea how to fix this? > > > > Could you tell us exactly which version this is, and exactly how you > > created the container? When I do it in ubuntu saucy (roughly 0.9.0 lxc), > > the cgroup gets correctly removed. > > > > Hi Serge, > > I think I have found the reason, when setup_cgroup() fail, the child process > may still exist when the father try to destroy cgroup.(We have no sync > mechanism > to ensure child can exit before father when something wrong happen) > > commit 6031a6e5f939bda07d98768d34dafae677a7dfeb > Author: Dwight Engen <dwight.en...@oracle.com> > Date: Wed May 15 12:27:34 2013 -0400 > > set non device cgroup items before the cgroup is entered > > This allows some special cgroup items such as memory.kmem.limit_in_bytes > to be successfully set, since they must be set before any task is put > into the cgroup. > > The devices cgroup is setup later giving the container a chance to mount > file systems before the device it might want to mount from becomes > unavailable. > > Signed-off-by: Dwight Engen <dwight.en...@oracle.com> > Signed-off-by: Serge Hallyn <serge.hal...@ubuntu.com> > > This patch moved setup_cgroup() before lxc_cgroup_enter(), when setup_cgroup() > fail, there is no task in cgroup, so remove cgroup wouldn't fail. > > So my problem no longer exists on the latest code, but there are still > potential problems if we don't ensure child exit before father, such as > Michael's problem, might also caused by this.
Right, so other failures later on *could* still cause this. Shall we do something like { // Wait on any unterminated children int status, ret; while ((ret = waitpid(-1, &status, 0)) > 0); } in lxc_abort() after the kill(handler->pid)? ------------------------------------------------------------------------------ Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET Get 100% visibility into your production application - at no cost. Code-level diagnostics for performance bottlenecks with <2% overhead Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap1 _______________________________________________ Lxc-devel mailing list Lxc-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/lxc-devel