On Wed, Feb 21, 2007 at 04:20:58PM -0800, Eric Schrock wrote: > Seems like there are a two pieces you're suggesting here: > > 1. Some sort of background process to proactively find errors on disks > in use by ZFS. This will be accomplished by a background scrubbing > option, dependent on the block-rewriting work Matt and Mark are > working on. This will allow something like "zpool set scrub=2weeks", > which will tell ZFS to "scrub my data at an interval such that all > data is touched over a 2 week period". This will test reading from > every block and verifying checksums. Stressing write failures is a > little more difficult.
I got the impression that testing free disk space was also desired. > 2. Distinguish "slow" drives from "normal" drives and proactively mark > them faulted. This shouldn't require an explicit "zpool dft", as > we should be watching the response times of the various drives and > keep this as a statistic. We want to incorporate this information > to allow better allocation amongst slower and faster drives. > Determining that a drive is "abnormally slow" is much more difficult, > though it could theoretically be done if we had some basis - either > historical performance for the same drive or comparison to identical > drives (manufacturer/model) within the pool. While we've thought > about these same issues, there is currently no active effort to keep > track of these statistics or do anything with them. I would imagine that "slow" as in "long average seek times" should be relatively easy to detect, whereas "slow" as in "low bandwidth" might be harder (since I/O bandwidth might depend on characteristics of the device path and how saturated it is). Are long average seek times an indication of trouble? Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss