On 2013-06-18 13:05, Stan Hoeppner wrote:
> On 6/18/2013 7:59 AM, Chris Purves wrote:
>> After upgrading to wheezy, I get a system hang every one or two days where 
>> the system becomes completely unresponsive and I need do a cold boot.  
>>
>> This is an older machine with an Athlon processor.  I'm not running X.  I 
>> don't see anything unusual in the logs.  The last entry in syslog is 
>> typically a cron job, but not always the same one.  The system seems to 
>> freeze without any warning.
>>
>> I tried downgrading the kernel back to the squeeze version (2.6) and it 
>> still locks up.  Before upgrading to wheezy I resized a few of the 
>> partitions.  Other than that, nothing else has changed and everything had 
>> been running fine for years.
>>
>> I'd appreciate any help in debugging this problem.
> 
> It's not a kernel/software problem as you're not seeing kernel panics,
> nothing in the logs.  Could be DRAM but it's unlikely.  Given that
> marginal silicon typically fails within hours/days of initial use and
> rarely thereafter, it's probably not a DIMM gone bad as someone else
> suggested.
> 
> You said this system is "older" and housing an Athlon CPU.  There were 5
> generations of Athlon produced from 1999 to 2005.  Thus this box could
> be anywhere from 7 to 14 years old.  On machines of this age you need to
> check/test/troubleshoot/replace hardware in the following order:
> 
> 1.  CPU fan -- rarely last 7 years, let alone 14.  Some models may lose
>     80% of their nominal RPM with age, yet without emitting noticeable
>     noise.  The heatsink may get just enough airflow to allow a few
>     days of run time.  When the fan fails completely, the box locks up
>     in a few minutes.

CPU fan is about three years old and while trying to debug this previously I 
was running 'sensors' in a cron-job every five minutes and the CPU temp never 
exceeded 60 C.

> 2.  PSU fan -- while failing will cause MOSFET/cap/etc overheating which
>     can cause "random" lockups, reboots, and other odd behavior
> 
> 3.  PSU itself -- failed fan can permanently damage MOSFETS/caps/etc
>     Even with a good fan, PSU components can fail with age.

PSU is near the top of my list.  I will be replacing it soon barring some new 
discovery.

> 4.  Removable media drives -- floppy/CD/DVD-ROM can fail in odd ways
>     sending spurious high voltage signals or shorting wires, locking up
>     the motherboard, or causing random reboots.  Disconnect their
>     data cable and power leads and run without them.

These were already disconnected.

> 5.  The motherboard.  Even with good cooling over the life of a machine
>     the motherboard can still simply fail.  You may not be able to find
>     bulged caps nor burn marks on VRMs, no visible signs of failure.

It could be that the motherboard is at the end.  I've had it for eight years 
and it was used when I got it.  My only issue is that this didn't happen until 
I upgraded to wheezy.  If this had started happening before the upgrade or a 
week or two after the upgrade I would more readily suspect a hardware failure.

>     Point in fact:  I had a Biostar Socket A nForce2 400 motherboard
>     w/Athlon XP 2500 simply give up the ghost in 2011 in a similar
>     manner.  It locked up a few times over a period of a week or so,
>     then simply wouldn't post.
> 
>     I built that machine in Aug 2003 and it lasted 8 years.  I started
>     with two 92x25mm Panaflow case fans, plus the 80x25 PSU fan.
>     I replaced the PSU fan with an NMB boxer, and the case fans with
>     two Nidec Beta Vs, twice during the life of the box.  All of the
>     fans were fully functional at the time of replacement.  This was
>     proactive maintenance.  The box had 110 CFM of properly directed
>     airflow during its lifespan.  Compare this to the ~30 CFM of a
>     quiet Dell, HP, or IBM machine.  Anyone who knows hardware knows
>     that these are all top shelf 12VDC fans.  The PSU is still running,
>     in another box, as are the two DIMMS and the CPU.  The motherboard
>     simply gave up the ghost after 8 years of 24x7 operation.  Let's
>     hope that isn't the case here.
> 
> 


-- 
Chris Purves
Visit my blog: http://chris.northfolk.ca


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/51c1a9a1.10...@northfolk.ca

Reply via email to