On Wednesday 17 February 2010 08:49:28 Harry Putnam wrote:
> I have caught the freeze in the early stages before completely losing
> the network when just mouse and keyboard became unresponsive, was able
> to ssh in and noticed that restarting hald held off the freeze for
> some (again unspecified) amount of time.
> 
> So cutting the lengthy narrative down a bit, and briefly put, I'm
> looking for anything unusual that is causing this.  The hdc messages
> is the only odd thing I'm seeing.
> 
> Something appears to be jamming up the hal layer somehow, but not
> leaving findable tracks.  At least not findable by an someone with
> many yrs experience with linux but not much real debugging of
> complicated problems under his belt.

You say the box runs ssh, implying that other hosts are nearby, so what I 
would suggest is to configure your syslogger to send all logs to another host 
and have that host write them to a known location.

I find that machines that freeze often still send logs to syslog properly 
right up to the moment of the freeze, but these do not get written to disk as 
IO is blocked. Then we restart the box, guaranteeing that the logs are lost 
:-)

Remote logging and just leave it till the machine freezes again will hopefully 
give you the useful logs you need to identify the problem. To save disk space 
you can configure logrotate on the remote logger to delete the previous days 
stuff - you don't need logs from days where the box was working fine.

Another option is to look at the pattern here: one day out of the blue a 
stable system developed problems and they still surface at random times. This 
is one of the characteristics of failing hardware. Have you done a full 
thorough hardware test, including such things as memtest and smart?

-- 
alan dot mckinnon at gmail dot com

Reply via email to