Re: [zfs-discuss] On-failure policies for pools

Richard Elling Tue, 23 Jan 2007 14:39:58 -0800

Peter Schuller wrote:

Hello,
There have been comparisons posted here (and in general out there on the net)for various RAID levels and the chances of e.g. double failures. One problemthat is rarely addressed though, is the various edge cases that significantlyimpact the probability of loss of data.


I agree 110%

In particular, I am concerned about the relative likelyhood of bad sectors ona drive, vs. entire-drive failure. On a raidz where uptime is not important,I would not want a dead drive + a single bad sector on another drive to causeloss of data, yet dead drive + bad sector is going to be a lot more likelythan two dead drives within the same time window.


I covered that topic in http://blogs.sun.com/relling/entry/a_story_of_two_mttdl
where I described a model which does take into account the probability of an
unrecoverable read during reconstruction.  In general, the problem is that the
average person can't get some of the detailed information that a more complex
model would require.  The models I describe use data that you can glean from
datasheets, or follow a reasonable extrapolation from historical data.

In many situations it may not feel worth it to move to a raidz2 just to avoidthis particular case.


I can't think of any, but then again, I get paid to worry about failures :-)

I would like a pool policy that allowed one to specify that at the moment adisk fails (where "fails" = considered faulty), all mutable I/O would beimmediately stopped (returning I/O errors to userspace I presume), and anytransaction in the process of being committed is rolled back. The result isthat the drive that just failed completely will not go out of dateimmediately.
If one then triggers a bad block on another drive while resilvering with areplacement drive, you know that you have the failed drive as a last resort(given that a full-drive failure is unlikely to mean the drive was physicallyobliterated; perhaps the controller circuitery can be replaced or certainphysical components can be replaced). In the case of raidz2, you effectivelyhave another "half" level of redundancy.


Please correct me if I misunderstand your reasoning, are you saying that a
broken disk should not be replaced?  If so, then that is contrary to the
accepted methods used in most mission critical systems.  There may be other
methods which meet your requirements and are accepted.  For example, one
procedure we see for those sites who are very interested in data retention
is to power off a system when it is degraded to a point (as specified) where
data retention is put at unacceptable risk.  The theory is that a powered
down system will stop wearing out.  When the system is serviced, then it can
be brought back online.  Obviously, this is not the case where data availability
is a primary requirement -- data retention has higher priority.

Also, with either raidz/raidz2 one can imagine cases where a machine is bootedwith one or two drives missing (due to cabling issues for example);guaranteeing that no pool is ever online for writable operations (thus makingabscent drives out of date) until the administrative explicitly asks for it,would greatly reduce the probability of data loss due to a bad block in thiscase aswell.
In short, if true irrevocable dataloss is limited (assuming no softwareissues) to the complete obliteration of all data on n drives (for n levels ofredundancy), or alternatively to the unlikely event of bad blocks co-incidingon multiple drives, wouldn't reliability be significantly increased in caseswhere this is an acceptable practice?
Opinions?


We can already set a pool (actually the file systems in a pool) to be read only.
I think that what may be lurking here is the fact that all of the RAS features
are not yet implemented, hence the issue of a corrupted zpool may cause a panic.
Clearly, such issues are short term and will be fixed anyway.

There may be something else lurking here that we might be able to take
advantage of, at least for some cases.  Since ZFS is COW, it doesn't have the
same "data loss" profile as other file systems, like UFS, which can overwrite
the data making reconstruction difficult or impossible.  But while this might
be useful for forensics, the general case is perhaps largely covered by the
existing snapshot features.

N.B. I do have a lot of field data on failures and failure rates.  It is often
difficult to grok without having a clear objective in mind.  We may be able to
agree on a set of questions which would quantify the need for your ideas.  Feel
free to contact me directly.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] On-failure policies for pools

Reply via email to