Ariane van der Steldt wrote:
On Thu, Aug 07, 2008 at 11:41:59AM +0200, Harald Dunkel wrote:
Ariane van der Steldt wrote:
Your best bet is to replace the disk. 30 minutes wait time seems a bit
odd though. I have a similar situation where one disk is having
problems, requiring the disk to restart, but that only takes approx. a
minute. You can mark the disk as bad and replace it before the other
disk fails I guess (after all, there's not much point in relying on a
faulty disk).
The problem is not replacing the disk, but how to avoid
30 minutes downtime due to some low level kernel routine
getting stuck.
Mark it as a bad disk? If you do that, the raid code should do no more
requests to the disk.
Seems we have some misunderstanding here. I am talking about
future events. Of course I don't know in advance which disk
fails when. If a disk dies, then its the job of raidframe to
detect this event, to mark the disk as bad, and to provide the
basic service with the remaining disks, as far as possible.
Looking at the log file it seems that raidframe _did_ mark
the disk as bad:
:
Aug 2 17:53:52 fw01 /bsd: raid0: IO Error. Marking /dev/wd0d as failed.
Aug 2 17:53:52 fw01 /bsd: raid0: node (Wpd) returned fail, rolling forward
:
And yet the machine became unresponsive for 30 minutes.
This took much too long.
Regards
Harri