On Sat, Feb 15, 2014 at 09:28:34AM -0800, SW wrote:

> So I removed zlib from OpenSSL and things seemed to be going well until this
> happened:
> 
> Feb 15 17:22:20 mail postfix/master[1555]: panic: master_reap: unknown pid:
> 27935

And you did not look earlier in the log to see what process 27935
was?  Or whether messages from master(8) just before this shed any
light on the event?

Perhaps interaction with the debugger causes waitpid(2) to report
a process more than once?  You sure have a quirky system.

>From waitpid(2):

     WIFSTOPPED(status)
             True if the process has not terminated, but has stopped and can
             be restarted.  This macro can be true only if the wait call spec-
             ified the WUNTRACED option or if the child process is being
             traced (see ptrace(2)).

>From ptrace(2):

     ptrace() provides tracing and debugging facilities.  It allows one
     process (the tracing process) to control another (the traced process).
     Most of the time, the traced process runs normally, but when it receives
     a signal (see sigaction(2)), it stops.  The tracing process is expected
     to notice this via wait(2) or the delivery of a SIGCHLD signal, examine
     the state of the stopped process, and cause it to terminate or continue
     as appropriate.  ptrace() is the mechanism by which all this happens.

The above seems to carry an expectation that the tracing process
is the parent, but with Postfix, the tracing process is a grand-child
of the traced process.  The parent of the traced process is master(8).

If, on your system for some reason, gdb traced processes are reported
to master(8) even though they have not in fact exited, then we
could see something along these lines.  Perhaps Postfix master(8)
should check for unsolicited WIFSTOPPED() conditions caused by
interaction of services traced by a debugger.

-- 
        Viktor.

Reply via email to