Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk failure,

Miles Nordin Thu, 28 Aug 2008 11:44:24 -0700

>>>>> "jl" == Jonathan Loran <[EMAIL PROTECTED]> writes:


    jl>   Fe = 46% failures/month * 12 months = 5.52 failures

the original statistic wasn't of this kind.  It was ``likelihood a
single drive will experience one or more failures within 12 months''.

so, you could say, ``If I have a thousand drives, about 4.66 of those
drives will silently-corrupt at least once within 12 months.''  It is
0.466% no matter how many drives you have.  

And it's 4.66 drives, not 4.66 corruptions.  The estimated number of
corruptions is higher because some drives will corrupt twice, or
thousands of times.  It's not a BER, so you can't just add it like
Richard did.

If the original statistic in the paper were of the kind you're talking
about, it would be larger than 0.466%.  I'm not sure it would capture
the situation well, though.  I think you'd want to talk about bits of
recoverable data after one year, not corruption ``events'', and this
is not really measured well by the type of telemetry NetApp has.  If
it were, though, it would still be the same size number no matter how
many drives you had.

The 37% I gave was ``one or more within a population of 100 drives
silently corrupts within 12 months.''  The 46% Richard gave has no
meaning, and doesn't mean what you just said.  The only statistic
under discussion which (a) gets intimidatingly large as you increase
the number of drives, and (b) is a ratio rather than, say, an absolute
number of bits, is the one I gave.

pgpl2HghkrzU1.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] pulling disks was: ZFS hangs/freezes after disk failure,

Reply via email to