On Mon, 21 Sep 2009 14:02:40 EDT erik quanstrom <quans...@quanstro.net> wrote: > > > i would think this is acceptable. at these low levels, something > > > else is going to get you -- like drives failing unindependently. > > > say because of power problems. > > > > 8% rate for an array rebuild may or may not be acceptable > > depending on your application. > > i think the lesson here is don't by cheep drives; if you > have enterprise drives at 1e-15 error rate, the fail rate > will be 0.8%. of course if you don't have a raid, the fail > rate is 100%. > > if that's not acceptable, then use raid 6.
Hopefully Raid 6 or zfs's raidz2 works well enough with cheap drives! > > > so there are 4 ways to fail. 3 double fail have a probability of > > > 3*(2^9 bits * 1e-14 1/ bit)^2 > > > > Why 2^9 bits? A sector is 2^9 bytes or 2^12 bits. > > > cut-and-paste error. sorry that was 2^19 bits, e.g. 64k*8 bits/byte. > the calculation is still correct, since it was done on that basis. Ok. > > If per sector recovery is done, you have > > 3E-22*(64K/512) = 3.84E-20 > > i'd be interested to know if anyone does this. it's not > as easy as it would first appear. do you know of any > hardware or software that does sector-level recovery? No idea -- I haven't really looked in this area in ages. In case of two stripes being bad it would make sense to me to reread a stripe one sector at a time since chances of the exact same sector being bad on two disks is much lower (about 2^14 times smaller for 64k stripes?). I don't know if disk drives return a error bit array along with data of a multisector read (nth bit is set if nth sector could not be recovered). If not, that would be a worthwhile addition. > i don't have enough data to know how likely it is to > have exactly 1 bad sector. any references? Not sure what you are asking. Reed-solomon are block codes, applied to a whole sector so per sector error rate is UER*512*8 where UER == uncorrectable error rate. [Early IDE disks had 4 byte ECC per sector. Now that bits are packed so tight, S/N ratio is far worse and ECC is at least 40 bytes, to keep UER to 1E-14 or whatever is the target].