Steve, Can someone from Foundations evaluate this bug? Core issue is that when child process dies, the parent process is also killed.
Michael On 09/30/2016 02:02 AM, Launchpad Bug Tracker wrote: > bugproxy (bugproxy) has assigned this bug to you for Ubuntu: > > Problem Description > =========================== > I write a simple systemd service which will fork child processes fiercely. > But quickly the service failed: > > % sudo systemctl status reproducer.service > ? reproducer.service - Reproducer of systemd services killed by ips > Loaded: loaded (/etc/systemd/system/reproducer.service; disabled; vendor > preset: enabled) > Active: failed (Result: exit-code) since Fri 2016-03-18 06:58:37 CDT; 2min > 43s ago > Process: 5103 ExecStart=/home/hpt/reproducer/reproducer.sh (code=exited, > status=0/SUCCESS) > Main PID: 5105 (code=exited, status=254) > > Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:36 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Main process exited, > code=exited, status=254/n/a > Mar 18 06:58:37 pinelp3 reproducer.sh[5103]: > /home/hpt/reproducer/reproducer.sh: fork: Resource temporarily unavailable > Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Unit entered failed > state. > Mar 18 06:58:37 pinelp3 systemd[1]: reproducer.service: Failed with result > 'exit-code'. > > The default task limit of systemd services is 512. Looks like the > service is terminated by the kernel's ips cgroup controller. I think > this isn't correct. Child processes cannot be forked shouldn't cause > parent to die. > > > % cat /etc/systemd/system/reproducer.service > [Unit] > Description=Reproducer of systemd services killed by ips > After=multi-user.target > > [Service] > ExecStart=/home/hpt/reproducer/reproducer.sh > Type=forking > > [Install] > WantedBy=multi-user.target > > % cat /home/hpt/reproducer/reproducer.sh > #!/bin/bash > > foo() > { > #exec sh -c "echo $1: \$\$;sleep 60" > echo $1: > sleep 60 > } > > bar() > { > c=1 > while true > do > for ((i=1;i<=2048;i++)) > do > foo $c & > ((c++)) > done > > wait > c=1 > done > } > > # main > bar & > > disown -a > > exit 0 > > > ---uname output--- > Linux pinelp3 4.4.0-12-generic #28-Ubuntu SMP Wed Mar 9 00:40:38 UTC 2016 > ppc64le ppc64le ppc64le GNU/Linux > > Machine Type = IBM,8408-E8E,lpar > > Steps to Reproduce > ================================ > 1. install the simple service in "Problem description" > 2. sudo systemctl start reproducer.service > 3. wait 2~3 minutes > > == Comment: #3 - Vaishnavi Bhat <vaish...@in.ibm.com> - 2016-03-22 11:21:55 == > >From the machine, > root@pinelp3:~# ulimit -a > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 0 > file size (blocks, -f) unlimited > pending signals (-i) 48192 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 1024 > pipe size (512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 48192 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > > root@pinelp3:~# ps aux | wc -l --------->While the service is > running > 1084 > root@pinelp3:~# ps aux | wc -l " > 1084 > root@pinelp3:~# ps aux | wc -l " > 1084 > root@pinelp3:~# ps aux | wc -l " > 1084 > root@pinelp3:~# ps aux | wc -l ---------->While the service is not > running. > 572 > > root@pinelp3:~# free -m --------------> While service is running > total used free shared buff/cache > available > Mem: 12117 628 459 22 11029 > 9541 > Swap: 2052 8 2044 > > root@pinelp3:~# free -m ------------> while the service is not running. > total used free shared buff/cache > available > Mem: 12117 308 809 22 10999 > 9890 > Swap: 2052 8 2044 > > == Comment: #4 - Breno Henrique Leitao <bren...@br.ibm.com> - 2016-03-22 > 11:48:57 == > This is a new feature in Ubuntu and Systemd that limits the amount of > processes/child created. > > You can disable it as doucmented in > https://wiki.ubuntu.com/ppc64el/Recommendations#Max_pids_on_Ubuntu_16.04 > > == Comment: #5 - Ping Tian Han <pt...@cn.ibm.com> - 2016-03-22 20:24:33 == > (In reply to comment #4) >> This is a new feature in Ubuntu and Systemd that limits the amount of >> processes/child created. >> >> You can disable it as doucmented in >> https://wiki.ubuntu.com/ppc64el/Recommendations#Max_pids_on_Ubuntu_16.04 > Hi, > > We knew this feature. But it is then main process of the service failed > when it cannont fork chlid processes. This the problem we want to know > how to fix. Thanks. > > == Comment: #6 - Breno Henrique Leitao <bren...@br.ibm.com> - 2016-03-24 > 13:58:08 == > Hi, > >> We knew this feature. But it is then main process of the service failed when >> it cannont fork chlid processes. This the problem we want to know how to >> fix. Thanks. > What do you want to fix exactly? There was an RFC limiting all systemd > services to spawn, a maximum of process. > > This came from the Systemd creator at > https://lists.freedesktop.org/archives/systemd- > devel/2015-November/035006.html. > > If you wish to have a service that needs more than 512 process, just > enable it on the service file, as: > > [Unit] > Description=Reproducer of systemd services killed by ips > After=multi-user.target > > [Service] > ExecStart=/home/hpt/reproducer/reproducer.sh > Type=forking > TasksMax=1024 (or bigger) > > [Install] > WantedBy=multi-user.target > > Anyway, what are you trying to test exactly, and why do you think it > fails? > > == Comment: #7 - Ping Tian Han <pt...@cn.ibm.com> - 2016-03-24 21:26:35 == > (In reply to comment #6) >> Hi, >> >>> We knew this feature. But it is then main process of the service failed when >>> it cannont fork chlid processes. This the problem we want to know how to >>> fix. Thanks. >> What do you want to fix exactly? There was an RFC limiting all systemd >> services to spawn, a maximum of process. >> >> This came from the Systemd creator at >> https://lists.freedesktop.org/archives/systemd-devel/2015-November/035006. >> html. >> >> If you wish to have a service that needs more than 512 process, just enable >> it on the service file, as: >> >> [Unit] >> Description=Reproducer of systemd services killed by ips >> After=multi-user.target >> >> [Service] >> ExecStart=/home/hpt/reproducer/reproducer.sh >> Type=forking >> TasksMax=1024 (or bigger) >> >> [Install] >> WantedBy=multi-user.target >> >> Anyway, what are you trying to test exactly, and why do you think it fails? > Hi, > > We saw that the main process of the example service quited (or killed?) > by some reason when it cannot fork more chlid processes. This is the > problem. We can tolerate that there is only 512 child processes but > cannot if the main process being killed. > > We think that even though the main process cannot fork more than 512 > child processes, it shouldn't be killed. > > thanks. > > == Comment: #8 - Breno Henrique Leitao <bren...@br.ibm.com> - 2016-03-29 > 10:16:04 == > Hi PIng, > >> We think that even though the main process cannot fork more than 512 child >> processes, it shouldn't be killed. > Right. Now I understand your point. So, what is the main process here? > Is it your serivce process, or the systemd process? > > PS: I am decreasing the priority to normal, since this is a very corner > case usage. > > == Comment: #9 - Kevin W. Rudd <ru...@us.ibm.com> - 2016-03-29 15:50:15 == > The process in question wasn't killed. It did its own exit: > > ... > 19236 clone(child_stack=0, > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, > child_tidptr=0x3fff96c25aa0) = -1 EAGAIN (Resource temporarily unavailable) > 19236 write(2, "/usr/local/bin/real_forkit.sh: f"..., 70) = 70 > 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 19236 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 > 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > 19236 exit_group(254) = ? > 19236 +++ exited with 254 +++ > > The problem here is that the replication script is not *handling* the > error. It is not even looking at the return code of the function it is > trying to fork. It is very possible that the shell itself triggered its > own protection mechanism when faced with a script that appeared to be > running away (the above exit was not after the first clone failure, but > after several calls to clone had failed). > > Personally, if I were the admin of this system, this is exactly what I > would want to happen to a run away process (smack it down quickly). > > == Comment: #10 - Ping Tian Han <pt...@cn.ibm.com> - 2016-03-29 21:17:16 == > (In reply to comment #9) >> The process in question wasn't killed. It did its own exit: >> >> ... >> 19236 clone(child_stack=0, >> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, >> child_tidptr=0x3fff96c25aa0) = -1 EAGAIN (Resource temporarily unavailable) >> 19236 write(2, "/usr/local/bin/real_forkit.sh: f"..., 70) = 70 >> 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >> 19236 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 >> 19236 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >> 19236 exit_group(254) = ? >> 19236 +++ exited with 254 +++ >> >> The problem here is that the replication script is not *handling* the error. >> It is not even looking at the return code of the function it is trying to >> fork. It is very possible that the shell itself triggered its own >> protection mechanism when faced with a script that appeared to be running >> away (the above exit was not after the first clone failure, but after >> several calls to clone had failed). >> > Thanks, Kevin. Looks like this is a problem of bash? I think there isn't > any reason casuing a shell process as parents to quit just because it > cannot fork more child processes... > >> Personally, if I were the admin of this system, this is exactly what I would >> want to happen to a run away process (smack it down quickly). > ** Affects: ubuntu > Importance: Undecided > Assignee: Taco Screen team (taco-screen-team) > Status: New > > > ** Tags: architecture-ppc64le bugnameltc-139320 severity-medium > targetmilestone-inin--- -- Michael Hohnbaum OIL Program Manager Power (ppc64el) Development Project Manager Canonical, Ltd. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1629226 Title: systemd's service killed by cgroup controller pids To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/bash/+bug/1629226/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs