Sirs,

I've been trying to debug a problem with Solaris 8 running on
sparc-softmmu. The syslog daemon in very unreliable (about 7 of 8 starts of
the syslog daemon end in a daemon hang - the daemon can be "killed" and
restarted manually).
*Background:* I looked at the syslogd.c code on the Oracle web site to see
what syslogd is doing. As part of initialization, syslogd tries to parse
syslog.conf. To read the syslog.conf file, syslogd creates a pipe for
output from m4. m4 is used to parse the syslog.conf file. Output from m4 is
put into the pipe that is then read by the parent process.

*Here is what I've done so far / learned:*
After boot, I log in and stop the syslogd.

I then truss (using -a -f and -sall flags) syslogd with the "-d" flag.

On Qemu, the syslog daemon stops after the child process exits. No
information generated by the child process and put into the pipe gets to
the parent process before the hang. When I send SIGINT twice (hit ctrl-c
twice -- one ctrl-c does not unblock pipe), the parent job sees the data in
the pipe and completes reading data in the pipe before the syslog daemon
exits due to the SIGINT.

I thought the issue might be related to something that m4 was doing, so I
replaced it with a shell script that output text like that actually output
by m4 (I manually parsed the syslog.conf file). I saw the same behaviour -
the syslogd parent process hung about the time the child process exited.

I tried this on real Sparc hardware with the same OS. On real Sparc
hardware the data appears in the pipe for use by the parent process about
the time the child process exits.

I thought the Qemu parent process might not be getting a SIGCLD or the
SIGCLD might not be sent by the child when it exits. So I have tried
sending SIGCLD manually using "kill". If I send SIGCLD twice (once does not
unblock the pipe, but I do see system activity from truss with this first
signal), the pipe is unblocked. The results are not consistent after the
pipe is unblocked. Syslogd may post messages to the log file, it may take
injecting new messages using "logger" to cause the backlog of messages to
get to the log file, or I may need to restart syslog daemon (and SIGCLD
again) to get messages to the log file.

*What to do next?*
I am not sure what to do next to help isolate what is going on (or not
going on that needs to be). It looks like something with signals is not
working correctly. I could try putting together a simple program to create
a pipe much as is done in syslogd to try to replicate the issue with pipes
in a simple way. But, I'm not sure how to dig deeper even if I were able to
replicate the issue with a small program.

Another thought is to create a Linux / Sparc32 machine to see if this issue
is apparent there, as well. Having a simple program as noted above might
help with this.

*Suggestions?*


Other Info:
QEMU-1.1.0rc2
Openbios-1057
SunOS Release 5.8 Version Generic_108528-11 32-bit


Respectfully,
Paul

Reply via email to