On Thu, 6 Feb 2003 14:13, Jason Lim wrote: > I was wondering what kind of failures you experience with long-running > hardware.
I don't recall seeing a computer that had been in service for more than 3 months fail in any way not associated with movement. Moving parts (fans and hard drives) die. Expansion boards and motherboards can die if they are moved or tweaked. If you only clean air filters while leaving the machine in place, and if the fans are all solid with ball bearings then it should keep running for many years. > Most of us run servers with very long uptimes (we've got a server here > with uptime approaching 3 years, which is not long compared to some, but > we think it is pretty good!). I think that's a bad idea. I've never seen a machine with an uptime of >1 year boot correctly. In my experience after more than a year of running someone will have changed something that makes either the OS or the important applications fail to start correctly and will have forgotten what they did (or left the company). > Most of these servers either have 3ware RAID cards, or have some other > sort of RAID (scsi, ide, software, etc.). The hard disks are replaced as > they fail, so by now some RAID 1 drives are actually 40Gb when only about > 20Gb is used, because the RAID hardware cannot "extend" to use the extra > size (but this is a different issue). Software RAID can deal with this. > Now... we can replace all the fans in the systems (eg. CPU fan, case fans, > etc.). Some even suggested we jimmy on an extra fan going sideways on the > CPU heatsick, so if the top fan fails at least airflow is still being > pushed around which is better than nothing (sort of like a redundant CPU > fan system). Not a good idea for a server system. Servers are designed to have air flow in a particular path through the machine. Change that in any way and you might get unexpected problems. > But how about the motherboards themselves? Is it often for something on > the motherboard to fail, after 3-4 years continuous operation without > failure? I've only seen motherboards fail when having RAM, CPUs, or expansion cards upgraded or replaced. I've heard of CPU and RAM failing, but only in situations where I was not confident that they had not been messed with. > We keep the systems at between 18-22 degrees celcius (tending towards the > lower end) as we've heard/read somewhere that for every degree drop in > temperature, hardware lifetime is extended by X number of years. Not sure > if that is still true? Also try to avoid changes in temperature. Thermal expansion is a problem. Try to avoid having machines turned off for any period of time. If working on a server with old hard drives power the drives up and keep them running unattached to the server while you are working for best reliability. Turning an old hard drive off for 30 minutes is regarded as being a great risk. But the best thing to do is regularly replace hard drives. Hard drives more than 3 years old should be thrown out. The best thing to do is only buy reasonably large hard drives (say a minimum of 40G for IDE and 70G for SCSI). Whenever a hard drive seems small it's probably due to be replaced. -- http://www.coker.com.au/selinux/ My NSA Security Enhanced Linux packages http://www.coker.com.au/bonnie++/ Bonnie++ hard drive benchmark http://www.coker.com.au/postal/ Postal SMTP/POP benchmark http://www.coker.com.au/~russell/ My home page -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]