On Fri, Feb 07, 2003 at 08:18:20AM +0100, Ronald Bultje wrote: > Hi Brian,
Hi Ronald. > it writes q\n, which quits lavplay nicely. Right. > After that, lavplay should be > long gone when do_real_exit() is called. Should be. But it might lag while it waits for disk i/o or something. > If not, then something is > surely wrong and we'd better kill lavplay as evilly ^^^^^^ I disagree with this "evilly". lavplay might just be waiting for disk access. Killing a process "evilly" just because it is waiting it's turn at disk (or some other resource) is not good process management. But this is not the real issue. > as possible to > prevent zombie processes, so we kill -9 it. This is the wrong approach. If you have to do this, then there is a bug in lavplay. Why not address the real problem instead of hacking around it? > Normally, the kill -9 won't do anything because there won't be any open > child processes. Wrong. You did not read all of my message carefully. I will explain again. When you issue the "kill(0,9)" you are not just killing lavplay but every process in the same process group! This is not just the child processes of glav, but also glav's parent and any children of that parent that have not issued their own setpgrp()! To show you an example of what that means (I do not mean to sound condecending, but you didn't seem to understand the ramifications when I explained them in my last message): # ps -eo pid,ppid,pgrp,tty,args | less PID PPID PGRP TT COMMAND 1 0 0 ? init 2 1 1 ? [keventd] 3 1 1 ? [ksoftirqd_CPU0] 4 1 1 ? [kswapd] 2260 1 2260 ? gnome-terminal ... ... 2311 2260 2311 pts/1 bash ... 17403 2311 17403 pts/1 /bin/bash /home/brian/bin/edlit test.mov 17410 17403 17403 pts/1 glav --size 640x480 test.mov 17411 17410 17403 pts/1 lavplay -q -g --size 640x480 test.mov 17412 17411 17403 pts/1 lavplay -q -g --size 640x480 test.mov 17413 17412 17403 pts/1 lavplay -q -g --size 640x480 test.mov 17414 17412 17403 pts/1 lavplay -q -g --size 640x480 test.mov 17415 17412 17403 pts/1 lavplay -q -g --size 640x480 test.mov ... As you can see, glav was started by a shell script (PID 17403), which was started by an interactive shell (PID 2311), which was started by a "gnome-terminal" (PID 2260). When the interactive shell starts (PID 2311) it starts a new "process group", in this case, process group 2311. All processes that inherit from that shell, unless they issue a setpgrp(), will have the same process group id. But notice that the shell script that started glav created a new process group (PGRP 17403), but glav _did_not_. So when glav issues a kill(0,9) every process with PGRP == 17403 will be killed with a SIGKILL, and that includes the process that started glav (and any other processes that that shell script may have started -- none in this example)! glav has no business killing any processes except those that it spawned. It should really only kill (nothing IMHO, but if you insist,) the children it knows it's responsible for, which is "pid" (the same "pid" that glav does a "waitpid" on directly after killing it). If you really want to keep the kill(), please either: a. change the first argument of kill() from 0 to pid or b. if you sill want that "wide sweeping" kill, at least have glav create a new process group with setpgrp(), or c. alternatively change the first argument of kill() to -1 or d. just use wait(). > I'd call this "not a bug, but a feature". No, it's not a feature. Having glav kill it's parent and it's parent's children is a bug. b. -- Brian J. Murrell
msg00630/pgp00000.pgp
Description: PGP signature