> what are you terming as "ZFS' incremental risk reduction"? I'm not Bill, but I'll try to explain.
Compare a system using ZFS to one using another file system -- say, UFS, XFS, or ext3. Consider which situations may lead to data loss in each case, and the probability of each such situation. The difference between those two sets is the 'incremental risk reduction' provided by ZFS. So, for instance, assuming you're using ZFS RAID in the first case, and a traditional RAID implementation in the second case: * Single-disk failure ==> same probability of occurrence, no data loss in either case. * Double-disk failure ==> same probability of occurrence, no data loss in either case (assuming RAID6/RAIDZ2; or data loss assuming RAID5/RAIDZ) * Uncorrectable read error ==> same probability of occurrence, no data loss in either case * Single-bit error on the wire ==> same, no data loss in either case * Multi-bit error on the wire, detected by CRC ==> same, no data loss * Multi-bit error on the wire ==> This is the first interesting case (since it differs). This is a case where ZFS will correct the error, and the standard RAID will not. The probability of occurrence is hard to compute, since it depends on the distribution of bit errors on the wire, which aren't really independent. Roughly, though, since the wire transfers usually use a 32-bit CRC, the probability of an undetected error is 2^-32, or 0.000 000 023 2%. [You could ask whether this is true for real data. It appears to be; see "Performance of Checksums and CRCs over Real Data" by Stone, Greenwald, Partridge & Hughes. ] * Error in the file system code ==> Another interesting case, but we don't have sufficient data to gauge probabilities. * Undetected error in host memory ==> same probability of occurrence, same data loss. * Undetected error in RAID memory ==> same probability, but data loss in non-ZFS case. We can estimate the probability of this, but I don't have current data. Single-bit errors were measured at a rate of 2*10^-12 on a number of systems in the mid-1990s (see "Single Event Upset at Ground Level" by Eugene Normand). If the bits are separated spatially (as is normally done), the probability of a double-bit error is roughly 4*10^-24, and of a triple-bit error, 8*10^-36. So an undetected error is very, VERY unlikely, at least from RAM cell effects. But ZFS can correct it, if it happens. * Failure of facility (e.g. fire, flood, power surge) ==> same/total loss of data. [ Total loss if you don't have a backup, of course. ] ... go on as desired. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss