On Thu, Aug 07, 2008 at 09:27:24AM +0200, Harald Dunkel wrote: > I've got a configuration issue with Raidframe: Our > gateway/firewall runs a raid1 for the system disk. > No swap partition. > > Recently one of the raid disks (wd0) showed some > problem: > > Aug 2 17:22:35 fw01 /bsd: wd0(pciide0:0:0): timeout > Aug 2 17:53:52 fw01 /bsd: type: ata > Aug 2 17:53:52 fw01 /bsd: c_bcount: 16384 > Aug 2 17:53:52 fw01 /bsd: c_skip: 0 > Aug 2 17:53:52 fw01 /bsd: pciide0:0:0: bus-master DMA error: missing > interrupt, status=0x21 > Aug 2 17:53:52 fw01 /bsd: pciide0 channel 0: reset failed for drive 0 > Aug 2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46172704 of > 46172704-46172735 (wd0 bn 50368000; cn 49968 tn 4 sn 4), retrying > : > : > Aug 2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46172704 of > 46172704-46172735 (wd0 bn 50368000; cn 49968 tn 4 sn 4) > Aug 2 17:53:52 fw01 /bsd: raid0: IO Error. Marking /dev/wd0d as failed. > Aug 2 17:53:52 fw01 /bsd: raid0: node (Wpd) returned fail, rolling forward > Aug 2 17:53:52 fw01 /bsd: pciide0:0:0: not ready, st=0xd0<BSY,DRDY,DSC>, > err=0x00 > Aug 2 17:53:52 fw01 /bsd: pciide0 channel 0: reset failed for drive 0 > Aug 2 17:53:52 fw01 /bsd: wd0d: device timeout writing fsbn 46137472 of > 46137472-46137503 (wd0 bn 50332768; cn 49933 tn 4 sn 52), retrying > : > : > Aug 2 17:53:53 fw01 /bsd: pciide0:0:0: not ready, st=0xd0<BSY,DRDY,DSC>, > err=0x00 > Aug 2 17:53:53 fw01 /bsd: pciide0 channel 0: reset failed for drive 0 > Aug 2 17:53:53 fw01 /bsd: wd0d: device timeout writing fsbn 46152320 of > 46152320-46152343 (wd0 bn 50347616; cn 49948 tn 0 sn 32) > Aug 2 17:53:53 fw01 /bsd: raid0: node (Wpd) returned fail, rolling forward > > > Surely wd0 is defect. Can happen. But my problem is that the > machine became unresponsive for 30 minutes. Even a ping did > not work. This is not what I would expect from a raid system. > > What would you suggest to reduce the waiting time? 2 minutes > would be OK, but 30 minutes downtime are a _huge_ problem. > > Do I have to expect the same for a raid5 built from 9 disks, but > with a higher probability, because there are more disks in the > loop?
Your best bet is to replace the disk. 30 minutes wait time seems a bit odd though. I have a similar situation where one disk is having problems, requiring the disk to restart, but that only takes approx. a minute. You can mark the disk as bad and replace it before the other disk fails I guess (after all, there's not much point in relying on a faulty disk). Ciao, Ariane