On Tue, May 17, 2005 at 02:29:41PM +0200, [EMAIL PROTECTED] wrote: > I have a problem with a bash script. The script (example) is very simple: > > ----script.sh----------------------- > #!/bin/bash > > echo hello > ssh PT-AGCMLX1 "while true; do date; sleep 10s; done" > ------------------------------------ > > When I start script.sh, look up its pid via ps and kill it, the ssh keeps > running: > > 9311 pts/4 S+ 0:00 /bin/bash ./script.sh > 9312 pts/4 S+ 0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s; > done > > > kill 9311 > > 9312 pts/4 S 0:00 ssh PT-AGCMLX1 while true; do date; sleep 10s; > done > > How can I change my script so that it kills all its child processes, if it > is killed itself ? I tried to use the "trap" function of bash, but it > never used the correct pid...
You probably want to kill the process _group_, as follows: $ /bin/kill -- -9311 Specifying the respective PID as a negative number tells kill to interpret it as a process group (for this to work, you also need the '--', to prevent the minus sign from mistakenly being parsed as an option identifier). Also, you have to use the real /bin/kill, not the respective bash builtin. You could, of course, put this in script.sh itself, by trapping EXIT: #!/bin/bash trap "/bin/kill -- -$$" EXIT ... (The process group will be equal to $$, here.) Then, a normal "kill 9311" should do, because now, upon exit, the shell will kill its own process group. Alternatively, simply use $ killall -g script.sh So far, so good. Unfortunately, there's still one problem remaining. The command which is run on the remote machine will not be terminated: Just to illustrate, after having started script.sh, you'd see something like (I'm simply using localhost in place of PT-AGCMLX) $ ps axf -o pid,ppid,pgrp,session,cmd PID PPID PGRP SESS CMD ... 19468 28299 19468 28299 | \_ /bin/bash ./script.sh 19469 19468 19468 28299 | \_ ssh localhost while true; do date; sleep 10s; done ... 19472 19470 257 257 \_ /usr/sbin/sshd 19475 19472 19475 19475 \_ bash -c while true; do date; sleep 10s; done 19507 19475 19475 19475 \_ sleep 10s ... After having killed the process group 19468, the remote-side processes would still be there -- apparently they're not being killed, when sshd (the forked sshd process handling that command, that is) terminates: ... 19475 1 19475 19475 bash -c while true; do date; sleep 10s; done 19567 19475 19475 19475 \_ sleep 10s ... I can't offer any simple solution for this :( -- this is something I'd be interested in myself. So, if anyone should know that magic formula, which would make sshd kill its children, please let us know... > On Tue, May 17, 2005 at 01:29:40PM +0200, Dennis Stosberg wrote: > > > > Have you tried to use "exec ssh PT-..." instead? > > No, I didn't know that. It's interesting, but unfortunately I can't use > that. I need to be able to kill the script via "killall script.sh" and > after the exec, script.sh isn't there anymore. I could use 'exec -a > script.sh', but the killall command won't work either... killall relies on /proc/<PID>/stat to find the appropriate processes, but exec -a only modifies /proc/<PID>/cmdline. In case you really need to use the name "script.sh" for killall, you might want to create some link "ln -s /usr/bin/ssh script.sh", and then "exec script.sh ..." (of course, the former script.sh would then have to have a different name...). Cheers, Almut -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]