On 9/19/06, Richard Elling - PAE <[EMAIL PROTECTED]> wrote:
[pardon the digression]
David Dyer-Bennet wrote:
> On 9/18/06, Richard Elling - PAE <[EMAIL PROTECTED]> wrote:
>
>> Interestingly, the operation may succeed and yet we will get an error
>> which recommends replacing the drive. For example, if the failure
>> prediction threshold is exceeded. You might also want to replace the
>> drive when there are no spare defect sectors available. Life would be
>> easier if they really did simply die.
>
> For one thing, people wouldn't be interested in doing ditto-block data!
>
> So, with ditto-block data, you survive any single-block failure, and
> "most" double-block failures, etc. What it doesn't lend itself to is
> simple computation of simple answers :-).
>
> In theory, and with an infinite budget, I'd approach this analagously
> to cpu architecture design based on large volumes of instruction trace
> data. If I had a large volume of disk operation traces with the
> hardware failures indicated, I could run this against the ZFS
> simulator and see what strategies produced the most robust single-disk
> results.
There is a significant difference. The functionality of logic part is
deterministic and discrete. The wear-out rate of a mechanical device
is continuous and probabilistic. In the middle are discrete events
with probabilities associated with them, but they are handled separately.
In other words, we can use probability and statistics tools to analyze
data loss in disk drives. This will be much faster and less expensive
than running a bunch of traces. In fact, there has already been much
written about disk drives, their failure modes, and factors which
contribute to their failure rates. We use such data to predict the
probability of events such as non-recoverable reads (which is often
specified in the data sheet).
Oh, I know there's a difference. It's not as big as it looks, though,
if you remember that the instruction or disk operation traces are just
*representative* of the workload, not the actual workload that has to
run. So, yes, disk failures are certainly non-deterministic, but the
actual instruction stream run by customers isn't the same one designed
against, either. In both cases the design has to take the trace as a
general guideline for types of things that will happen, rather than as
a strict workload to optimize for.
--
David Dyer-Bennet, <mailto:[EMAIL PROTECTED]>, <http://www.dd-b.net/dd-b/>
RKBA: <http://www.dd-b.net/carry/>
Pics: <http://www.dd-b.net/dd-b/SnapshotAlbum/>
Dragaera/Steven Brust: <http://dragaera.info/>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss