On Mon, 21 Sep 2009 14:02:40 EDT erik quanstrom <quans...@quanstro.net>  wrote:
> > > i would think this is acceptable.  at these low levels, something
> > > else is going to get you -- like drives failing unindependently.
> > > say because of power problems.
> > 
> > 8% rate for an array rebuild may or may not be acceptable
> > depending on your application.
> 
> i think the lesson here is don't by cheep drives; if you
> have enterprise drives at 1e-15 error rate, the fail rate
> will be 0.8%.  of course if you don't have a raid, the fail
> rate is 100%.
>
> if that's not acceptable, then use raid 6.

Hopefully Raid 6 or zfs's raidz2 works well enough with cheap
drives!

> > > so there are 4 ways to fail.  3 double fail have a probability of
> > > 3*(2^9 bits * 1e-14 1/ bit)^2
> > 
> > Why 2^9 bits? A sector is 2^9 bytes or 2^12 bits.
>
> 
> cut-and-paste error.  sorry that was 2^19 bits, e.g. 64k*8 bits/byte.
> the calculation is still correct, since it was done on that basis.

Ok.

> > If per sector recovery is done, you have
> >     3E-22*(64K/512) = 3.84E-20
> 
> i'd be interested to know if anyone does this.  it's not
> as easy as it would first appear.  do you know of any
> hardware or software that does sector-level recovery?

No idea -- I haven't really looked in this area in ages.  In
case of two stripes being bad it would make sense to me to
reread a stripe one sector at a time since chances of the
exact same sector being bad on two disks is much lower (about
2^14 times smaller for 64k stripes?).  I don't know if disk
drives return a error bit array along with data of a
multisector read (nth bit is set if nth sector could not be
recovered).  If not, that would be a worthwhile addition.

> i don't have enough data to know how likely it is to
> have exactly 1 bad sector.  any references?

Not sure what you are asking.  Reed-solomon are block codes,
applied to a whole sector so per sector error rate is
UER*512*8 where UER == uncorrectable error rate. [Early IDE
disks had 4 byte ECC per sector.  Now that bits are packed so
tight, S/N ratio is far worse and ECC is at least 40 bytes,
to keep UER to 1E-14 or whatever is the target].

Reply via email to