After not receiving any response from my last post, I continued on digging on my problem server and I'm lost...
The problem: After upgrading the server from 6.2 to 7.3 (fresh install), it started to freeze up for no apparent reason. Nothing in the logs, no high load, no specific process causing any kind of error that I can find. The freezeups have been happening regularly...pretty much every night now.
What I've done so far: - Ran memtest86 for hours on end - no errors (with only one specific configuration option causing the system to freeze again - whenever I told it to do a BIOS All test, as opposed to a BIOS Standard. As soon as I picked 'All', it would rescan the system cache and memory and freeze. I'm not sure what that means.) - Removed ram sticks, even replaced them - no difference - Removed attached SCSI chain, even pulled out the Adaptec controller card - no difference - Removed one of the CPUs and let it run with only 1 (and booting the correct kernel) - no difference - Swapped CPUs - no difference - Removed hard drives that aren't needed - no difference
So I'm faced with two things now that I can think of: a) the power supply, or b) the motherboard. One thing I've noticed is that out of the 10 servers I have running, this is the only one that every few hours has to re-adjust its clock. All the other machines can run for days without needing an adjustment. This one seems to do it a few times a day. So maybe that's something to look at.
Tonight will be another test - I've dropped a different 128 Mb stick in yet again (the other 4 GB is sitting on the table). If by morning it's dead again, I honestly don't know what else to look for. I'll probably swap the board. Someone please give me some idea of what may be causing this. The system is two years old (give or take 6 months, and was running for months at a time under 6.2...now I can't even get one week uptime.
-- W | I haven't lost my mind; it's backed up on tape somewhere. +-------------------------------------------------------------------- Ashley M. Kirchner <mailto:[EMAIL PROTECTED]> . 303.442.6410 x130 IT Director / SysAdmin / WebSmith . 800.441.3873 x130 Photo Craft Laboratories, Inc. . 3550 Arapahoe Ave. #6 http://www.pcraft.com ..... . . . Boulder, CO 80303, U.S.A.
smime.p7s
Description: S/MIME Cryptographic Signature