On Wed, Feb 21, 2007 at 03:35:06PM -0700, Gregory Shaw wrote:
> Below is another paper on drive failure analysis, this one won best  
> paper at usenix:
> 
> http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/ 
> index.html
> 
> What I found most interesting was the idea that drives don't fail  
> outright most of the time.   They can slow down operations, and  
> slowly die.

Seems like there are a two pieces you're suggesting here:

1. Some sort of background process to proactively find errors on disks
   in use by ZFS.  This will be accomplished by a background scrubbing
   option, dependent on the block-rewriting work Matt and Mark are
   working on.  This will allow something like "zpool set scrub=2weeks",
   which will tell ZFS to "scrub my data at an interval such that all
   data is touched over a 2 week period".  This will test reading from
   every block and verifying checksums.  Stressing write failures is a
   little more difficult.

2. Distinguish "slow" drives from "normal" drives and proactively mark
   them faulted.  This shouldn't require an explicit "zpool dft", as
   we should be watching the response times of the various drives and
   keep this as a statistic.  We want to incorporate this information
   to allow better allocation amongst slower and faster drives.
   Determining that a drive is "abnormally slow" is much more difficult,
   though it could theoretically be done if we had some basis - either
   historical performance for the same drive or comparison to identical
   drives (manufacturer/model) within the pool.  While we've thought
   about these same issues, there is currently no active effort to keep
   track of these statistics or do anything with them.

These two things combined should avoid the need for an explicit fitness
test.

Hope that helps,

- Eric

--
Eric Schrock, Solaris Kernel Development       http://blogs.sun.com/eschrock
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to