Peter Schuller wrote:
Hello,
There have been comparisons posted here (and in general out there on the net)
for various RAID levels and the chances of e.g. double failures. One problem
that is rarely addressed though, is the various edge cases that significantly
impact the probability of loss of data.
I agree 110%
In particular, I am concerned about the relative likelyhood of bad sectors on
a drive, vs. entire-drive failure. On a raidz where uptime is not important,
I would not want a dead drive + a single bad sector on another drive to cause
loss of data, yet dead drive + bad sector is going to be a lot more likely
than two dead drives within the same time window.
I covered that topic in http://blogs.sun.com/relling/entry/a_story_of_two_mttdl
where I described a model which does take into account the probability of an
unrecoverable read during reconstruction. In general, the problem is that the
average person can't get some of the detailed information that a more complex
model would require. The models I describe use data that you can glean from
datasheets, or follow a reasonable extrapolation from historical data.
In many situations it may not feel worth it to move to a raidz2 just to avoid
this particular case.
I can't think of any, but then again, I get paid to worry about failures :-)
I would like a pool policy that allowed one to specify that at the moment a
disk fails (where "fails" = considered faulty), all mutable I/O would be
immediately stopped (returning I/O errors to userspace I presume), and any
transaction in the process of being committed is rolled back. The result is
that the drive that just failed completely will not go out of date
immediately.
If one then triggers a bad block on another drive while resilvering with a
replacement drive, you know that you have the failed drive as a last resort
(given that a full-drive failure is unlikely to mean the drive was physically
obliterated; perhaps the controller circuitery can be replaced or certain
physical components can be replaced). In the case of raidz2, you effectively
have another "half" level of redundancy.
Please correct me if I misunderstand your reasoning, are you saying that a
broken disk should not be replaced? If so, then that is contrary to the
accepted methods used in most mission critical systems. There may be other
methods which meet your requirements and are accepted. For example, one
procedure we see for those sites who are very interested in data retention
is to power off a system when it is degraded to a point (as specified) where
data retention is put at unacceptable risk. The theory is that a powered
down system will stop wearing out. When the system is serviced, then it can
be brought back online. Obviously, this is not the case where data availability
is a primary requirement -- data retention has higher priority.
Also, with either raidz/raidz2 one can imagine cases where a machine is booted
with one or two drives missing (due to cabling issues for example);
guaranteeing that no pool is ever online for writable operations (thus making
abscent drives out of date) until the administrative explicitly asks for it,
would greatly reduce the probability of data loss due to a bad block in this
case aswell.
In short, if true irrevocable dataloss is limited (assuming no software
issues) to the complete obliteration of all data on n drives (for n levels of
redundancy), or alternatively to the unlikely event of bad blocks co-inciding
on multiple drives, wouldn't reliability be significantly increased in cases
where this is an acceptable practice?
Opinions?
We can already set a pool (actually the file systems in a pool) to be read only.
I think that what may be lurking here is the fact that all of the RAS features
are not yet implemented, hence the issue of a corrupted zpool may cause a panic.
Clearly, such issues are short term and will be fixed anyway.
There may be something else lurking here that we might be able to take
advantage of, at least for some cases. Since ZFS is COW, it doesn't have the
same "data loss" profile as other file systems, like UFS, which can overwrite
the data making reconstruction difficult or impossible. But while this might
be useful for forensics, the general case is perhaps largely covered by the
existing snapshot features.
N.B. I do have a lot of field data on failures and failure rates. It is often
difficult to grok without having a clear objective in mind. We may be able to
agree on a set of questions which would quantify the need for your ideas. Feel
free to contact me directly.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss