> On Thu, 31 Dec 2009, Bob Friesenhahn wrote: > I like the nice and short answer from this "Bob > Friesen" fellow the > best. :-) It was succinct, wasn't it? 8-)
Sorry - I pulled the attribution from the ID, not the signature which was waiting below. DOH! When you say: > It does not really matter what Solaris or ZFS does if the drive > essentially locks up when it is trying to recover a bad sector. I'd have to say that it depends. If Solaris/zfs/etc. is restricted to actions which consist of marking the disk semi-permanently bad and continuing, yes, it amounts to the same thing: it opens a yawning chasm of "one more error and you're dead," until the array can be serviced and un-degraded. At least I think it does, based on what I've read, anyway. However, if OS/S/zfs/etc. performs an appropriate fire drill up to and including logging the issues, quiescing the array, and annoying the operator then it closes up the sudden-death window. This gives the operator of the array a chance to do something about it, such as swapping in a spare and starting rebuilding/resilvering/etc. Given the largish aggregate monetary value to RAIDZ builders of sidestepping the doubled-cost of raid specialized drives, it occurs to me that having a special set of actions for desktop-ish drives might be a good idea. Something like a fix-the-failed repair mode which pulls all recoverable data off the purportedly failing drive and onto a new spare to avoid a monster resilvering and the associated vulnerable time to a second or third failure. Viewed in that light, exactly what OS/S/zfs does on a long extended reply from a disk and exactly what can be done to minimize the time when the array runs in a degraded mode where the next step loses the data seems to be a really important issue. Well, OK, it does to me because my purpose here is getting to background scrubbing of errors in the disks. Other things might be more important to others. 8-) And the question might be moot if the SMART SCT architecture in desktop drives lets you do a power-on hack to shorten the reply-failed time for better raid operation. That's actually the solution I'd like to see in a perfect world - I get back to a redundant array of INEXPENSIVE disks, and I can pick those disks to be big and slow/low power instead of fast/high power. I'd welcome any enlightened speculation on this. I do recognize that I'm an idiot on these matters compared to people with actual experience. 8-) -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss