Re: question about raidframe getting stuck

Ariane van der Steldt Thu, 07 Aug 2008 01:09:08 -0700

On Thu, Aug 07, 2008 at 09:27:24AM +0200, Harald Dunkel wrote:
> I've got a configuration issue with Raidframe: Our
> gateway/firewall runs a raid1 for the system disk.
> No swap partition.
> 
> Recently one of the raid disks (wd0) showed some
> problem:
> 
> Aug  2 17:22:35 fw01 /bsd: wd0(pciide0:0:0): timeout
> Aug  2 17:53:52 fw01 /bsd:      type: ata
> Aug  2 17:53:52 fw01 /bsd:      c_bcount: 16384
> Aug  2 17:53:52 fw01 /bsd:      c_skip: 0
> Aug  2 17:53:52 fw01 /bsd: pciide0:0:0: bus-master DMA error: missing 
> interrupt, status=0x21
> Aug  2 17:53:52 fw01 /bsd: pciide0 channel 0: reset failed for drive 0
> Aug  2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46172704 of 
> 46172704-46172735 (wd0 bn 50368000; cn 49968 tn 4 sn 4), retrying
> :
> :
> Aug  2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46172704 of 
> 46172704-46172735 (wd0 bn 50368000; cn 49968 tn 4 sn 4)
> Aug  2 17:53:52 fw01 /bsd: raid0: IO Error.  Marking /dev/wd0d as failed.
> Aug  2 17:53:52 fw01 /bsd: raid0: node (Wpd) returned fail, rolling forward
> Aug  2 17:53:52 fw01 /bsd: pciide0:0:0: not ready, st=0xd0<BSY,DRDY,DSC>, 
> err=0x00
> Aug  2 17:53:52 fw01 /bsd: pciide0 channel 0: reset failed for drive 0
> Aug  2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46137472 of 
> 46137472-46137503 (wd0 bn 50332768; cn 49933 tn 4 sn 52), retrying
> :
> :
> Aug  2 17:53:53 fw01 /bsd: pciide0:0:0: not ready, st=0xd0<BSY,DRDY,DSC>, 
> err=0x00
> Aug  2 17:53:53 fw01 /bsd: pciide0 channel 0: reset failed for drive 0
> Aug  2 17:53:53 fw01 /bsd: wd0d: device timeout writing fsbn 46152320 of 
> 46152320-46152343 (wd0 bn 50347616; cn 49948 tn 0 sn 32)
> Aug  2 17:53:53 fw01 /bsd: raid0: node (Wpd) returned fail, rolling forward
> 
> 
> Surely wd0 is defect. Can happen. But my problem is that the
> machine became unresponsive for 30 minutes. Even a ping did
> not work. This is not what I would expect from a raid system.
> 
> What would you suggest to reduce the waiting time? 2 minutes
> would be OK, but 30 minutes downtime are a _huge_ problem.
> 
> Do I have to expect the same for a raid5 built from 9 disks, but
> with a higher probability, because there are more disks in the
> loop?


Your best bet is to replace the disk. 30 minutes wait time seems a bit
odd though. I have a similar situation where one disk is having
problems, requiring the disk to restart, but that only takes approx. a
minute. You can mark the disk as bad and replace it before the other
disk fails I guess (after all, there's not much point in relying on a
faulty disk).

Ciao,
Ariane

Re: question about raidframe getting stuck

Reply via email to