Re: Various system freeze

epsilon Sat, 29 Dec 2012 10:07:33 -0800

Hi Philip,

thanks for answering. Let's go ...

On Sat, Dec 29, 2012 at 04:10:05AM -0800, Philip Guenther wrote:
> Your case, as far as you described it, is not the same as frantisek holop's.

Right. Not totally the same. But some similarities.

> Most of the descriptions I've seen have been too imprecise to help in 
> diagnosis.
>  "It freezes somewhere after "starting network daemons" and "starting
> local daemons". I
>   tried to disable services I do not essentially need or to substitute
>   them with other solutions. So far no findings here."
> 
> Freezes 'somewhere'?  Hard to make hypotheses about the cause when
> we're not told what processes were started, or whether it's consistent
> from freeze to freeze.  If you turn on ddb.console=1 in sysctl.conf

ddb.console=1 turned on now. Will check the next time the freeze
occurs.

> can you break into ddb when it hangs?

Shout at me, but the magic key mentioned in the manpage is ctrl+c on
i386, right?

> What's trace and ps show in
> that case?  show bcstats?  If you've performed tests of various sorts,
> what did they show?  Negative results are sometimes _more_ important
> than positive results; why bother doing a test if you're going to
> throw out the result?  What hypotheses have been *excluded* by your
> test results?

First rc.conf.local:

sendmail_flags="-L sm-mta -C/etc/mail/sendmail.cf -bd -q30m"
named_flags=""
httpd_flags="-DSSL -u"
ftpproxy_flags=""
tftpd_flags="-4 -l 192.168.xx.xx /tftpboot"
ifstated_flags=""
dhcpd_flags="xl0"

additionally rc.local:

/usr/local/sbin/sockd -D
/usr/local/sbin/squid

I did use the old sytyle for starting local daemons to eliminate
problems with the new rd.d system. It is just a guess. But
unfortunately no result. Please: I did not mean there are problems
with the recent introduced rc.d system. It was just a guess to see if
this changes anything. But it did not.

Now what did I mean with somewhere: Randomly. The freeze happens
randomly after starting one of the daemons. There is no pattern.
Sometimes it freezes after starting sshd, sometimes later. In one case
the freeze was after the loginprompt appears. In most cases it's
earlier.

What else did I try?

o I substituted sockd (dante) with nylon. Result: For three days no
  freeze. First I was lucky, I thought I found the problem. But than
  again a freeze.
o I disabled ifstated after the freeze occured just after starting
  this daemon. One day no freeze. But than again: freeze.
o I disabled ntpd complete because it's possible to operate the box
  with slightly inaccurate time. So ntpd can be excluded for sure!

What proofs this? Is is possible to exclude dante/nylon/ifstated for
sure? Not really. Maybe it's a combination I did not find so far.

I have not disabled squid. I do not use NAT, so disable squid and
dante makes this box wortless (i.e. me offline).

> The title of the original thread was "snapshots total freeze", but
> there were dmesg's in the thread showing Aug kernel builds; for those
> who haven't tried running a (recent) snapshot, does your problem
> reproduce or change symptoms when you do?

For now I don't want to update the system to a snapshot. My primary
reason is this would imply a complete new installation when 5.3 comes
out. The updateprocess ist described from stable to stable and not
anything else. I hope it's possible to find something without
switching to current/snapshot. This box survived two months 5.2, so
maybe the next four month will be survived too :\ Shout at me, but I
am a -stable user.

> Is this consistent across hardware?  Drop another machine into place
> where the freezing one is; does it freeze too?

It is consistent across hardware. I tested another hardware with some
differences:

o SATA drive instead IDE.
o other NICs
o faster CPU (and a heavy duty fan that gaves me the ability to make a
  guess on CPU load which was confirmed by j...@osn.de useing a VM).

Additionally I ran a `dd if=/dev/rwd0c of=/dev/null bs=1m` as sugested
on the list. No errors.

What makes me wonder is the following: Why did those freezes occur on
5.2 and on snapshots starting in November? My box runs as a gateway
useing pppoe(4). Again I guessed: Maybe something "from the evil
internets" like those nasty bug we had once with protocol 0 (maybe
you remember the guy running nmap protocol scans through PF). So I did
not power on the DSL modem during boot for some days. But no success.
The box froze after one or two days during boot and without powered
modem.

I think this is really the only thing I can exclude for sure. Because
my modem was switched off, it cannot be something triggered from the
"evil internets". It must be triggered from my local site. And
additionally, it must be triggered from within this box, because for
some days I powered on this box alone, i.e. all other machines on my
local network were still switched off. Again: Freezes after some days.

But the network is still on topic: Someone claimed he had no freezes
if he disabled logging in PF. pflogd is started _after_ PF is enabled.
Did anyone check what happens if pflogd is started before PF? Maybe I
give it a try. It's just I feel uncomfortable in hacking /etc/rc. This
file is not intended to be changed by users, right?

Additional information: To save electrical energy I power off all my
machines during nights and during longer working pauses during days. So
usually I boot two or three times a day which gives me enough tries to
trigger the problem. What is funny: rebooting the machine just for fun
to trigger the bug did not work. I have no good explanation.

Thanks for the ideas and suggestions for more tests.

Regards
  Eps

Re: Various system freeze

Reply via email to