On Thu, May 17, 2012 at 11:07 PM, Paul Wilhelm <bearcat.pi...@gmail.com> wrote: > Sirs, > > I've been trying to debug a problem with Solaris 8 running on sparc-softmmu. > The syslog daemon in very unreliable (about 7 of 8 starts of the syslog > daemon end in a daemon hang - the daemon can be "killed" and restarted > manually). > > Background: I looked at the syslogd.c code on the Oracle web site to see > what syslogd is doing. As part of initialization, syslogd tries to parse > syslog.conf. To read the syslog.conf file, syslogd creates a pipe for output > from m4. m4 is used to parse the syslog.conf file. Output from m4 is put > into the pipe that is then read by the parent process. > > Here is what I've done so far / learned: > After boot, I log in and stop the syslogd. > > I then truss (using -a -f and -sall flags) syslogd with the "-d" flag. > > On Qemu, the syslog daemon stops after the child process exits. No > information generated by the child process and put into the pipe gets to the > parent process before the hang. When I send SIGINT twice (hit ctrl-c twice > -- one ctrl-c does not unblock pipe), the parent job sees the data in the > pipe and completes reading data in the pipe before the syslog daemon exits > due to the SIGINT. > > I thought the issue might be related to something that m4 was doing, so I > replaced it with a shell script that output text like that actually output > by m4 (I manually parsed the syslog.conf file). I saw the same behaviour - > the syslogd parent process hung about the time the child process exited. > > I tried this on real Sparc hardware with the same OS. On real Sparc hardware > the data appears in the pipe for use by the parent process about the time > the child process exits. > > I thought the Qemu parent process might not be getting a SIGCLD or the > SIGCLD might not be sent by the child when it exits. So I have tried sending > SIGCLD manually using "kill". If I send SIGCLD twice (once does not unblock > the pipe, but I do see system activity from truss with this first signal), > the pipe is unblocked. The results are not consistent after the pipe is > unblocked. Syslogd may post messages to the log file, it may take injecting > new messages using "logger" to cause the backlog of messages to get to the > log file, or I may need to restart syslog daemon (and SIGCLD again) to get > messages to the log file. > > What to do next? > I am not sure what to do next to help isolate what is going on (or not going > on that needs to be). It looks like something with signals is not working > correctly. I could try putting together a simple program to create a pipe > much as is done in syslogd to try to replicate the issue with pipes in a > simple way. But, I'm not sure how to dig deeper even if I were able to > replicate the issue with a small program. > > Another thought is to create a Linux / Sparc32 machine to see if this issue > is apparent there, as well. Having a simple program as noted above might > help with this. > > Suggestions?
The signal or pipe handling code in the OS probably uses a corner case of some instruction which is emulated incorrectly, or maybe the no-fault mode in MMU could be the usual suspect. In the former case, you could try enabling -d in_asm and check the log if near the signal something unusual happens. The log is going the be huge though. For the no-fault mode, you could try changing MMU_NF handling code in mmu_helper.c somehow. Alternatively, if you have access to x86 Solaris, you could try to make a solaris-user emulator so that only syslogd process would be emulated. I've sent rough initial patches for that earlier. > > > Other Info: > QEMU-1.1.0rc2 > Openbios-1057 > SunOS Release 5.8 Version Generic_108528-11 32-bit > > > Respectfully, > Paul