> On Thu, 31 Dec 2009, Bob Friesenhahn wrote:
> I like the nice and short answer from this "Bob
> Friesen" fellow the 
> best. :-)
It was succinct, wasn't it?  8-)

Sorry - I pulled the attribution from the ID, not the 
signature which was waiting below. DOH!

When you say:
> It does not really matter what Solaris or ZFS does if the drive 
> essentially locks up when it is trying to recover a bad sector.
I'd have to say that it depends. If Solaris/zfs/etc. is restricted
to actions which consist of marking the disk semi-permanently
bad and continuing, yes, it amounts to the same thing: it opens
a yawning chasm of "one more error and you're dead," until the
array can be serviced and un-degraded. At least I think it 
does, based on what I've read, anyway.

However, if OS/S/zfs/etc. performs an appropriate fire drill up
to and including logging the issues, quiescing the array, and 
annoying the operator then it closes up the sudden-death window. 
This gives the operator of the array a chance to do something 
about it, such as swapping in a spare and starting 
rebuilding/resilvering/etc. 

Given the largish aggregate monetary value to RAIDZ builders of 
sidestepping the doubled-cost of raid specialized drives, it occurs
to me that having a special set of actions for desktop-ish drives 
might be a good idea. Something like a fix-the-failed repair mode
which pulls all recoverable data off the purportedly failing drive
and onto a new spare to avoid a monster resilvering and the associated
vulnerable time to a second or third failure.

Viewed in that light, exactly what OS/S/zfs does on a long extended
reply from a disk and exactly what can be done to minimize the 
time when the array runs in a degraded mode where the next step
loses the data seems to be a really important issue. 

Well, OK, it does to me because my purpose here is getting to 
background scrubbing of errors in the disks. Other things might
be more important to others.  8-)

And the question might be moot if the SMART SCT architecture in
desktop drives lets you do a power-on hack to shorten the reply-failed
time for better raid operation. That's actually the solution I'd like
to see in a perfect world - I get back to a redundant array of INEXPENSIVE
disks, and I can pick those disks to be big and slow/low power instead
of fast/high power. 

I'd welcome any enlightened speculation on this. I do recognize that
I'm an idiot on these matters compared to people with actual 
experience. 8-)
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to