>>>>> "Brian" == Brian May <[EMAIL PROTECTED]> writes:
Brian> Thats not the way I grok the strace log:
Brian> 4266 unlink("/var/run/amavis/amavisd.pid" <unfinished ...>
Brian> 12868 <... rt_sigaction resumed> {SIG_DFL}, 8) = 0 20650
Brian> rt_sigaction(SIGCHLD, {SIG_DFL}, <unfinished ...> 4266
Brian> <... unlink resumed> ) = 0
Brian> So it would appear that 4266 is deleting the PID file
Brian> before it exits.
Brian> However I am getting lost in a maze of PIDs, will continue
Brian> investigating later. -- Brian May <[EMAIL PROTECTED]>
here is a vague process tree in a format that I (tm) understand:
31119 execve("/etc/init.d/amavis", ["/etc/init.d/amavis", "start"], [/* 22 vars
*/]) = 0
----> 5148
31119 --- SIGCHLD (Child exited) @ 0 (0) ---
----> 28681 execve("/bin/chown", ["chown", "-c", "-h", "amavis:amavis",
"/var/run/amavis"], [/* 20 vars */]) = 0
31119 --- SIGCHLD (Child exited) @ 0 (0) ---
----> 14384 execve("/bin/chmod", ["chmod", "-c", "755", "/var/run/amavis"], [/*
20 vars */]) = 0
31119 --- SIGCHLD (Child exited) @ 0 (0) ---
----> 5948 execve("/sbin/start-stop-daemon", ["start-stop-daemon", "--start",
"--quiet", "--pidfile", "/var/run/amavis/amavisd.pid", "--name", "amavisd-new",
"--startas", "/usr/sbin/amavisd-new", "--", "-P", "/var/
run/amavis/amavisd.pid", "start"], [/* 20 vars */]) = 0
---> 27723 execve("/bin/sh", ["sh", "-c", "run-parts --list
\"/usr/share/ama"...], [/* 20 vars */] <unfinished ...>
5948 --- SIGCHLD (Child exited) @ 0 (0) ---
---> 27966 execve("/bin/sh", ["sh", "-c", "run-parts --list
\"/etc/amavis/co"...], [/* 20 vars */] <unfinished ...>
5948 --- SIGCHLD (Child exited) @ 0 (0) ---
---> 17788 execve("/usr/bin/head", ["head", "-n", "1", "/etc/mailname"],
[/* 21 vars */] <unfinished ...>
5948 --- SIGCHLD (Child exited) @ 0 (0) ---
---> 23935 execve("/bin/hostname", ["hostname", "--fqdn"], [/* 21 vars
*/] <unfinished ...>
5948 --- SIGCHLD (Child exited) @ 0 (0) ---
---> 25027 continues
5948 exit_group(0)
31119 --- SIGCHLD (Child exited) @ 0 (0) ---
31119 wait4(-1, 0x7bc972e62554, WNOHANG, NULL) = -1 ECHILD (No child processes)
25027 continues
3119 exit_group(0)
25027 continues
----> 4266 continues
25027 continues
4266 --- SIGPIPE (Broken pipe) @ 0 (0) ---
---> 15290 continues
15290 execve("/usr/bin/dccproc", ["/usr/bin/dccproc",
"-H", "-x", "0"], [/* 23 vars */]) = 0
4266 --- SIGCHLD (Child exited) @ 0 (0) ---
---> 20650 continues
---> 12868 continues
25027 --- SIGALRM (Alarm clock) @ 0 (0) ---
25027 kill(4266, SIGTERM) = 0
25027 wait4(4266, <unfinished ...>
4266 --- SIGTERM (Terminated) @ 0 (0) ---
4266 kill(12868, SIGTERM) = 0
4266 kill(20650, SIGTERM <unfinished ...>
12868 --- SIGTERM (Terminated) @ 0 (0) ---
20650 --- SIGTERM (Terminated) @ 0 (0) ---
...
12868 exit_group(0)
4266 --- SIGCHLD (Child exited) @ 0 (0) ---
4266 unlink("/var/run/amavis/amavisd.pid" <unfinished ...>
20650 exit_group(0)
4266 exit_group(0)
25027 --- SIGCHLD (Child exited) @ 0 (0) ---
----> 12365
12365 --- SIGPIPE (Broken pipe) @ 0 (0) ---
-----> 15895
-----> 32317
25027 --- SIGALRM (Alarm clock) @ 0 (0) ---
25027 kill(12365, SIGTERM) = 0
25027 wait4(12365, <unfinished ...>
12365 --- SIGTERM (Terminated) @ 0 (0) ---
12365 kill(32317, SIGTERM) = 0
12365 kill(15895, SIGTERM <unfinished ...>
32317 --- SIGTERM (Terminated) @ 0 (0) ---
15895 --- SIGTERM (Terminated) @ 0 (0) ---
----> 12365 exit_group(0)
25027 --- SIGCHLD (Child exited) @ 0 (0) ---
----> 7431
7431 --- SIGPIPE (Broken pipe) @ 0 (0) ---
----> 2578
2578 --- SIGPIPE (Broken pipe) @ 0 (0) ---
32317 exit_group(0) = ?
15895 exit_group(0) = ?
25027 --- SIGTERM (Terminated) @ 0 (0) ---
25027 kill(7431, SIGTERM) = 0
25027 kill(2578, SIGTERM <unfinished ...>
7431 --- SIGTERM (Terminated) @ 0 (0) ---
2578 --- SIGTERM (Terminated) @ 0 (0) ---
25027 wait4(-1, <unfinished ...>
7431 exit_group(0) = ?
2578 exit_group(0) = ?
For some reason:
25027 receives a SIGALRM, and as a result, kills its child. The child,
before it dies, kills its children.
25027 then creates two more children, and kills them in the same way.
25027 then forks two more children. I am guessing this is some sort of
failsafe mode???
Finding out why 25027 is killing its children is probably the key
issue here.
--
Brian May <[EMAIL PROTECTED]>
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]