Actually I wrote a very simple piece of code to recover as much as possible from a hard disk the other day (jordan's disk died with a lot of code on it and we are still trying to recover it). I'll clean it up and put it up.
On Mon, Jul 31, 2006 at 10:41:21AM -0400, RV Tec wrote: > Folks, > > I had two crashes, on two different days, with the same reason: a dying > hard drive. Definitively, it is really unpleasant to get caught with my > pants down. > > There is a way to test hard drives for possible failures or foresee > those errors? > > The SMART thing isn't that smart at all. Even after the server crashed > twice due faulty harddrive, SMART keeps teeling me everything is OK. > > This is a SEAGATE SATA, only 1 year old. I'd expect a longer life of those > drives. Am I wrong? > > Jul 30 13:23:36 home wd0 at pciide1 channel 0 drive 0: <ST380013AS> > Jul 30 13:23:36 home wd0: 16-sector PIO, LBA48, 76319MB, 156301488 sectors > Jul 30 13:23:36 home wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 5 > > Jul 29 13:53:55 home wd0(pciide1:0:0): timeout > Jul 29 13:53:55 home type: ata > Jul 29 13:53:55 home type: ata > Jul 29 13:53:55 home c_bcount: 16384 > Jul 29 13:53:55 home c_bcount: 16384 > Jul 29 13:53:55 home c_skip: 0 > Jul 29 13:53:55 home c_skip: 0 > Jul 29 13:53:55 home pciide1:0:0: bus-master DMA error: missing interrupt, > status=0x21 > Jul 29 13:53:55 home pciide1:0:0: bus-master DMA error: missing interrupt, > status=0x21 > Jul 29 13:53:55 home wd0f: device timeout reading fsbn 1984192 of > 1984192-1984223 (wd0 bn 30295888; cn 30055 tn 7 sn 7), retrying > Jul 29 13:53:55 home wd0f: device timeout reading fsbn 1984192 of > 1984192-1984223 (wd0 bn 30295888; cn 30055 tn 7 sn 7), retrying > Jul 29 13:53:55 home wd0: soft error (corrected) > Jul 29 13:53:55 home wd0: soft error (corrected) > Jul 29 13:54:05 home wd0(pciide1:0:0): timeout > Jul 29 13:54:05 home wd0(pciide1:0:0): timeout > Jul 29 13:54:05 home type: ata > Jul 29 13:54:05 home type: ata > Jul 29 13:54:05 home c_bcount: 16384 > Jul 29 13:54:05 home c_bcount: 16384 > Jul 29 13:54:05 home c_skip: 0 > Jul 29 13:54:05 home c_skip: 0 > Jul 29 13:54:05 home pciide1:0:0: bus-master DMA error: missing interrupt, > status=0x21 > Jul 29 13:54:05 home pciide1:0:0: bus-master DMA error: missing interrupt, > status=0x21 > Jul 29 13:54:05 home wd0: transfer error, downgrading to Ultra-DMA mode 4 > Jul 29 13:54:05 home wd0: transfer error, downgrading to Ultra-DMA mode 4 > Jul 29 13:54:05 home wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 4 > Jul 29 13:54:05 home wd0(pciide1:0:0): using PIO mode 4, Ultra-DMA mode 4 > Jul 29 13:54:05 home wd0e: device timeout reading fsbn 1113568 of > 1113568-1113599 (wd0 bn 12648112; cn 12547 tn 11 sn 43), retrying > Jul 29 13:54:05 home wd0e: device timeout reading fsbn 1113568 of > 1113568-1113599 (wd0 bn 12648112; cn 12547 tn 11 sn 43), retrying > Jul 29 13:54:06 home wd0: soft error (corrected) > Jul 29 13:54:06 home wd0: soft error (corrected) > > > Thanks! > > RV