Folks, Over the past year, I have replaced something around 20 IDE Harddrives in 5 different computers running Debian because of drive faults. I know about IDE and that it's "consumer quality" and no more, but it can't be the case that the failure rate is that high.
The drives are mostly made by IBM/Hitachi, and they run 24/7, as the machines in question are either routers, firewalls, or servers. Replacing a drive would be a result of symptoms, such as frequent segmentation faults, corrupt files, and zombie processes. In all cases, I replaced the drive, transferred the data (mostly without problems), got the machine back into a running state, then ran `badblocks -svw` on the disk. And usually, I'd see a number of bad blocks, usually in excess of 100. The other day, I received a replacement drive from Hitachi, plugged it into a test machine, ran badblocks and verified that there were no badblocks. I then put the machine into a firewall, sync'd the data (ext3 filesystems) and was ready to let the computers be and head off to the lake... when the new firewall kept reporting bad reloc headers in libraries, APT would stop working, there would be random single-letter flips in /var/lib/dpkg/available (e.g. swig's Version field would be labelled "Verrion"), and the system kept reporting segfaults. I consequently plugged the drive into another test machine and ran badblocks -- and it found more than 2000 -- on a drive that had non the day before. Just now, I got another replacement from Hitachi (this time it wasn't a "serviceable used part", but a new drive), and out of the box, it featured 250 bad blocks. My vendor says that bad blocks are normal, and that I should be running the IBM drive fitness test on the drives to verify their functionality. Moreover, he says that there are tools to remap bad blocks. My understanding was that EIDE does automatic bad sector remapping, and if badblocks actually finds a bad block, then the drive is declared dead. Is this not the case? The reason I am posting this is because I need mental support. I'm going slightly mad. I seem to be unable to buy non-bad IDE drives, be they IBM, Maxtor, or Quantum. Thus I spend excessive time on replacing drives and keeping systems up by brute-force. And when I look around, there are thousands of consumer machines that run day-in-day-out without problems. It may well be that Windoze has better error handling when the harddrive's reliability degrades (I don't want to say this is a good thing). It may be that IDE hates me. I don't think it's my IDE controller, since there are 5 different machines involved, and the chance that all IDE controllers report bad blocks where there aren't any, but otherwise function fine with respect to detecting the drives (and not reporting the dreaded dma:intr errors). So I call to you and would like to know a couple of things: - does anyone else experience this? - does anyone know why this is happening? - is it true that bad blocks are normal and can be handled properly? - why is this happening to me? - can bad blocks arise from static discharge or impurities? when i replace disks, I usually put the new one into the case loosely and leave the cover open. The disk is not subjected to any shocks or the like, it's sitting still as a rock, it's just not affixed. I will probably never buy IDE again. But before I bash companies like Hitachi for crap quality control, I would like to make sure that I am not the one screwing up. Any comments? -- Please do not CC me when replying to lists; I read them! .''`. martin f. krafft <[EMAIL PROTECTED]> : :' : proud Debian developer, admin, and user `. `'` `- Debian - when you have better things to do than fixing a system Invalid PGP subkeys? Use subkeys.pgp.net as keyserver!
pgp00000.pgp
Description: PGP signature