Re: Complete disk disaster

Ramiro Aceves Fri, 26 Aug 2005 02:55:08 -0700

Matty wrote:
> On Wed, 24 Aug 2005, Stuart Henderson wrote:
> 
>> --On 24 August 2005 10:37 +0200, Ramiro Aceves wrote:
>>
>>> pciide0:0:1: bus-master DMA error: missing interrupt, status=0x61
>>> wd1a: device timeout reading fsbn 1489200 of 1489200-1489203 (wd1 bn
>>> 1489263; cn 1477 tn 7 sn 6), retrying
>>> wd1: soft error (corrected)
>>> wd1(pciide0:0:1): timeout
>>>     type: ata
>>>     c_bcount: 2048
>>>     c_skip: 0
>>> pciide0:0:1: bus-master DMA error: missing interrupt, status=0x61
>>> wd1a: device timeout reading fsbn 1486176 of 1486176-1486179 (wd1 bn
>>> 1486239; cn 1474 tn 7 sn 6), retrying
>>> wd1: soft error (corrected)
>>
>> [etc]
>>
>> All hard drives have bad blocks, most hard drives now have some spare
>> capacity. As the drive detects bad or failing blocks, the spare blocks
>> are automatically remapped over the bad blocks. This is internal to
>> the drive - by the time you start noticing drive errors, the drive is
>> usually unable to remap any more blocks.
> 
> 
> smartmontools does a great job of notifying you prior to this occurring.
> When you startup smartd to alert when S.M.A.R.T attributes change, you
> can watch the drive slowly die over time. smartmontools is part of the
> OpenBSD
> ports tree in case you interested in giving it a spin.
> 
>>
>> Sometimes the manufacturer's drive-test tools can be useful
>> (Hitachi/IBM's DFT can do some basic tests on drives from other
>> manufacturers too). There's also a commercial program Spinrite which
>> claims to have good stress-tests.
> 
> 
>


Hello all,

First, I want to thank everyone who helped me with this weird issue.
Matty, thanks for you info, but this is a 10 year old disk, and does not
support the SMART facility. :-(

I have been doing some tests. I removed the drive, and placed it on a
older Pentium 133 MHz machine (of course changed  master/slave
settings). I installed OpenBSD there, played a lot with the drive until
I filled it with plenty of files, ran fsck and the result was that there
were not errors. Everything was just fine.


I moved the drive back to the "modern" AMD 1200 MHz athlon, and after
the same hard disk marathon, ran fsck, there were not errors!!.
I have concluded that:


- Thre was a bad cable connection. Unplugging and plugging the cable
again fixed the problem.

- I have done this tests with the box _opened_. Perhaps there are some
heating problems in the disk that I am going to investigate further
(this disk is in the middle of the main disk and the floppy drive, so I
guess it gets hot.

- I have not wait enough time for the problem to occur.


Many thanks to all.
I will keep you informed in case the issues come again.


Ramiro.
EA1ABZ.

Re: Complete disk disaster

Reply via email to