> > You have to detect the problem first. ZFS is in a > > much better position > > to detect the problem due to block checksums. > > Bulls***, to quote another poster here who has since been strangely quiet. > The vast majority of what ZFS can detect (save for *extremely* rare > undetectable bit-rot and for real hardware (path-related) errors that > studies like CERN's have found to be very rare - and you have yet to > provide even anecdotal evidence to the contrary)
You wanted anectodal evidence: During my personal experience with only two home machines, ZFS has helped me detect corruption at least three times in a period of a few months. One due to silent corruption due to a controller bug (and a driver that did not work around it). Another time corruption during hotswapping (though this does not necessarily count since I did it on hardware that I did not know was supposed to support it, and I would not have attempted it to begin with otherwise). Third time I don't remember now. You may disregard it if you wish. In my professional life I have seen bitflips a few times in the middle of real live data running on "real" servers that are used for important data. As a result I have become pretty paranoid about it all, making heavy use of par2. (I have also seen various file system corruption / system instability issues that may very well be consistent with bit flips / other forms of corruption, but where there has been no proof of the underlying cause of the problems.) > can also be detected by > scrubbing, and it's arguably a lot easier to apply brute-force scrubbing > (e.g., by scheduling a job that periodically copies your data to the null > device if your system does not otherwise support the mechanism) than to > switch your file system. How would your magic scrubbing detect arbitrary data corruption without checksumming or redundancy? A lot of the data people save does not have checksumming. Even if it does, the file system meta data typically does not. Nor does various minor information related to the data (let's day the meta data associated with your backup of your other data, even if that data has some internal checksumming). I think one needs to stop making excuses by observing properties of specific file types and simlar. You can always use FEC to do error correction on arbitrary files if you really feel they are important. But the point is that with ZFS you get detection of *ANY* bit error for free (essentially), and optionally correction if you have redundancy. it doesn't matter if it's internal file system meta data, that important file you didn't consider important from a corruption perspective, or in the middle of some larger file that you may or may not have applied FEC on otherwise. Even without fancy high-end requirements, it is nice to have some good statistical reason to believe that random corruption does not occurs. Even if only to drive your web browsers or E-Mail client; at least you can be sure that random bitflips (unless they either are undetected due to an implementation bug, or occurrs in memory/etc) is not the cause of your random application misbehavior. It's like choosing RAM. You can make excuses all you want about doing proper testing, buying good RAM, or having redundancy at other levels etc - but you will still sleep better knowing you have ECC RAM than some random junk. Or let's do the seat belt analogy. You can try to convince yourself/other people all you want that you are a safe driver, that you should not drive in a way that allows crashes or whatever else - but you are still going to be safer with a seat belt than without it. This is also why we care about fsync(). It doesn't matter that you spent $100000 on that expensive server with redundant PSU:s hooked up to redundant UPS systems. *SHIT HAPPENS*, and when it does, you want to be maximally protected. Yes, ZFS is not perfect. But to me, both in the context of personal use and more serious use, ZFS is, barring some implementation details, more or less exactly what I have always wanted and solves pretty much all of the major problems with storage. And let me be clear: That is not hype. It's ZFS actually providing what I have wanted, and what I knew I wanted even before ZFS (or WAFL or whatever else) was ever on my radar. For some reason some people seem to disagree. That's your business. But the next time you have a power outtage, you'll be sorry if you had a database that didn't do fsync()[1], a filesystem that had no correction checking whatsoever[2], a RAID5 system that didn't care about parity correctness in the face of a crash[3], and a filesystem or application whose data is not structured such that you can ascertain *what* is broken after the crash and what is not[4]. You will be even more sorry two years later when something really important malfunctioned as a result of undetected corruption two years earlier... [1] Because of course all serious players use proper UPS and a power outtage should never happen unless you suck. (This has actually been advocated to me. Seriously.) [2] Because of [1] and because of course you only run stable software that is well tested and will never be buggy. (This has been advocated. Seriously.) [3] Because of [1]. [4] Because of [1], [2] and [3]. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>' Key retrieval: Send an E-Mail to [EMAIL PROTECTED] E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss