Re: [zfs-discuss] Yager on ZFS

Anton B. Rang Wed, 05 Dec 2007 21:04:05 -0800

> what are you terming as "ZFS' incremental risk reduction"?

I'm not Bill, but I'll try to explain.


Compare a system using ZFS to one using another file system -- say, UFS, XFS, 
or ext3.

Consider which situations may lead to data loss in each case, and the 
probability of each such situation.

The difference between those two sets is the 'incremental risk reduction' 
provided by ZFS.

So, for instance, assuming you're using ZFS RAID in the first case, and a 
traditional RAID implementation in the second case:

* Single-disk failure ==> same probability of occurrence, no data loss in 
either case.

* Double-disk failure ==> same probability of occurrence, no data loss in 
either case (assuming 
RAID6/RAIDZ2; or data loss assuming RAID5/RAIDZ)

* Uncorrectable read error ==> same probability of occurrence, no data loss in 
either case

* Single-bit error on the wire ==> same, no data loss in either case

* Multi-bit error on the wire, detected by CRC ==> same, no data loss

* Multi-bit error on the wire ==>
  This is the first interesting case (since it differs).
  This is a case where ZFS will correct the error, and the standard RAID will 
not.
  The probability of occurrence is hard to compute, since it depends on the 
distribution of
  bit errors on the wire, which aren't really independent.  Roughly, though, 
since the wire
  transfers usually use a 32-bit CRC, the probability of an undetected error is 
2^-32, or
  0.000 000 023 2%.  [You could ask whether this is true for real data. It 
appears to be; see
  "Performance of Checksums and CRCs over Real Data" by Stone, Greenwald, 
Partridge & Hughes. ]

* Error in the file system code ==>
  Another interesting case, but we don't have sufficient data to gauge 
probabilities.

* Undetected error in host memory ==> same probability of occurrence, same data 
loss.

* Undetected error in RAID memory ==> same probability, but data loss in 
non-ZFS case.
  We can estimate the probability of this, but I don't have current data.
  Single-bit errors were measured at a rate of 2*10^-12 on a number of systems 
in the
  mid-1990s (see "Single Event Upset at Ground Level" by Eugene Normand).  If 
the bits
  are separated spatially (as is normally done), the probability of a 
double-bit error is
  roughly 4*10^-24, and of a triple-bit error, 8*10^-36.  So an undetected 
error is very,
  VERY unlikely, at least from RAM cell effects.  But ZFS can correct it, if it 
happens.

* Failure of facility (e.g. fire, flood, power surge) ==> same/total loss of 
data.
  [ Total loss if you don't have a backup, of course. ]

... go on as desired.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Yager on ZFS

Reply via email to