[pardon the digression]
David Dyer-Bennet wrote:
On 9/18/06, Richard Elling - PAE <[EMAIL PROTECTED]> wrote:
Interestingly, the operation may succeed and yet we will get an error
which recommends replacing the drive. For example, if the failure
prediction threshold is exceeded. You might also want to replace the
drive when there are no spare defect sectors available. Life would be
easier if they really did simply die.
For one thing, people wouldn't be interested in doing ditto-block data!
So, with ditto-block data, you survive any single-block failure, and
"most" double-block failures, etc. What it doesn't lend itself to is
simple computation of simple answers :-).
In theory, and with an infinite budget, I'd approach this analagously
to cpu architecture design based on large volumes of instruction trace
data. If I had a large volume of disk operation traces with the
hardware failures indicated, I could run this against the ZFS
simulator and see what strategies produced the most robust single-disk
results.
There is a significant difference. The functionality of logic part is
deterministic and discrete. The wear-out rate of a mechanical device
is continuous and probabilistic. In the middle are discrete events
with probabilities associated with them, but they are handled separately.
In other words, we can use probability and statistics tools to analyze
data loss in disk drives. This will be much faster and less expensive
than running a bunch of traces. In fact, there has already been much
written about disk drives, their failure modes, and factors which
contribute to their failure rates. We use such data to predict the
probability of events such as non-recoverable reads (which is often
specified in the data sheet).
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss