On Thu, 8 May 2008, Ross Smith wrote:

> True, but I'm seeing more and more articles pointing out that the 
> risk of a secondary failure is increasing as disks grow in size, and

Quite true.

> While I'm not sure of the actual error rates (Western digital list 
> their unrecoverable rates as < 1 in 10^15), I'm very concious that 
> if you have any one disk fail completely, you are then reliant on 
> being able to read without error every single bit of data from every 
> other disk in that raid set.  I'd much rather have dual parity and 
> know that single bit errors are still easily recoverable during the 
> rebuild process.

I understand the concern.  However, the published unrecoverable rates 
are for the completely random write/read case.  ZFS validates the data 
read for each read and performs a repair if a read is faulty.  Doing a 
"zfs scrub" forces all of the data to be read and repaired if 
necessary.  Assuming that the data is read (and repaired if necessary) 
on a periodic basis, the chance that an unrecoverable read will occur 
will surely be dramatically lower.  This of course assumes that the 
system administrator pays attention and proactively replaces disks 
which are reporting unusually high and increasing read failure rates.

It is a simple matter of statistics.  If you have read a disk block 
successfully 1000 times, what is the probability that the next read 
from that block will spontaneously fail?  How about if you have read 
from it successfully a million times?

Assuming a reasonably designed storage system, the most likely cause 
of data loss is human error due to carelessness or confusion.

Bob
======================================
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to