Hi Philip, thanks for answering. Let's go ...
On Sat, Dec 29, 2012 at 04:10:05AM -0800, Philip Guenther wrote: > Your case, as far as you described it, is not the same as frantisek holop's. Right. Not totally the same. But some similarities. > Most of the descriptions I've seen have been too imprecise to help in > diagnosis. > "It freezes somewhere after "starting network daemons" and "starting > local daemons". I > tried to disable services I do not essentially need or to substitute > them with other solutions. So far no findings here." > > Freezes 'somewhere'? Hard to make hypotheses about the cause when > we're not told what processes were started, or whether it's consistent > from freeze to freeze. If you turn on ddb.console=1 in sysctl.conf ddb.console=1 turned on now. Will check the next time the freeze occurs. > can you break into ddb when it hangs? Shout at me, but the magic key mentioned in the manpage is ctrl+c on i386, right? > What's trace and ps show in > that case? show bcstats? If you've performed tests of various sorts, > what did they show? Negative results are sometimes _more_ important > than positive results; why bother doing a test if you're going to > throw out the result? What hypotheses have been *excluded* by your > test results? First rc.conf.local: sendmail_flags="-L sm-mta -C/etc/mail/sendmail.cf -bd -q30m" named_flags="" httpd_flags="-DSSL -u" ftpproxy_flags="" tftpd_flags="-4 -l 192.168.xx.xx /tftpboot" ifstated_flags="" dhcpd_flags="xl0" additionally rc.local: /usr/local/sbin/sockd -D /usr/local/sbin/squid I did use the old sytyle for starting local daemons to eliminate problems with the new rd.d system. It is just a guess. But unfortunately no result. Please: I did not mean there are problems with the recent introduced rc.d system. It was just a guess to see if this changes anything. But it did not. Now what did I mean with somewhere: Randomly. The freeze happens randomly after starting one of the daemons. There is no pattern. Sometimes it freezes after starting sshd, sometimes later. In one case the freeze was after the loginprompt appears. In most cases it's earlier. What else did I try? o I substituted sockd (dante) with nylon. Result: For three days no freeze. First I was lucky, I thought I found the problem. But than again a freeze. o I disabled ifstated after the freeze occured just after starting this daemon. One day no freeze. But than again: freeze. o I disabled ntpd complete because it's possible to operate the box with slightly inaccurate time. So ntpd can be excluded for sure! What proofs this? Is is possible to exclude dante/nylon/ifstated for sure? Not really. Maybe it's a combination I did not find so far. I have not disabled squid. I do not use NAT, so disable squid and dante makes this box wortless (i.e. me offline). > The title of the original thread was "snapshots total freeze", but > there were dmesg's in the thread showing Aug kernel builds; for those > who haven't tried running a (recent) snapshot, does your problem > reproduce or change symptoms when you do? For now I don't want to update the system to a snapshot. My primary reason is this would imply a complete new installation when 5.3 comes out. The updateprocess ist described from stable to stable and not anything else. I hope it's possible to find something without switching to current/snapshot. This box survived two months 5.2, so maybe the next four month will be survived too :\ Shout at me, but I am a -stable user. > Is this consistent across hardware? Drop another machine into place > where the freezing one is; does it freeze too? It is consistent across hardware. I tested another hardware with some differences: o SATA drive instead IDE. o other NICs o faster CPU (and a heavy duty fan that gaves me the ability to make a guess on CPU load which was confirmed by j...@osn.de useing a VM). Additionally I ran a `dd if=/dev/rwd0c of=/dev/null bs=1m` as sugested on the list. No errors. What makes me wonder is the following: Why did those freezes occur on 5.2 and on snapshots starting in November? My box runs as a gateway useing pppoe(4). Again I guessed: Maybe something "from the evil internets" like those nasty bug we had once with protocol 0 (maybe you remember the guy running nmap protocol scans through PF). So I did not power on the DSL modem during boot for some days. But no success. The box froze after one or two days during boot and without powered modem. I think this is really the only thing I can exclude for sure. Because my modem was switched off, it cannot be something triggered from the "evil internets". It must be triggered from my local site. And additionally, it must be triggered from within this box, because for some days I powered on this box alone, i.e. all other machines on my local network were still switched off. Again: Freezes after some days. But the network is still on topic: Someone claimed he had no freezes if he disabled logging in PF. pflogd is started _after_ PF is enabled. Did anyone check what happens if pflogd is started before PF? Maybe I give it a try. It's just I feel uncomfortable in hacking /etc/rc. This file is not intended to be changed by users, right? Additional information: To save electrical energy I power off all my machines during nights and during longer working pauses during days. So usually I boot two or three times a day which gives me enough tries to trigger the problem. What is funny: rebooting the machine just for fun to trigger the bug did not work. I have no good explanation. Thanks for the ideas and suggestions for more tests. Regards Eps