>>>>> "gm" == Gary Mills <mi...@cc.umanitoba.ca> writes:
gm> There are many different components that could contribute to gm> such errors. yes of course. gm> Since only the lower ZFS has data redundancy, only it can gm> correct the error. um, no? An example already pointed out: kerberized NFS will detect network errors that sneak past the weak TCP checksum, and resend the data. This will work even on an unredundant, unchecksummed UFS filesystem to correct network-induced errors. There is no need for NFS to ``inform'' UFS so that UFS can use ``redundancy'' to ``correct the error''. UFS never hears anything, and doesn't have any redundancy. NFS resends the data. done. iSCSI also has application-level CRC's, seperately enableable for headers and data. not sure what FC has. It doesn't make any sense to me that some higher layer would call back to the ZFS stack on the bottom, and tell it to twiddle with disks because there's a problem with the network. An idea Richard brought up months ago was ``protection domains,'' that it might be good to expose ZFS checksums to higher levels to stretch a single protection domain as far as possible upwards in the stack. Application-level checksums also form a single protection domain, for _reading_. Suppose corruption happens in RAM or on the network (where the ZFS backing store cannot detect it), while reading a gzip file on an NFS client. gzip will always warn you! This is end-to-end, and will warn you just as perfectly as hypothetical end-to-end networkified-ZFS. The problem: there's no way for gzip to ``retry''. You can run gunzip again, but it will just fail again and again because the file with network-induced errors is cached on the NFS client. It's the ``cached badness'' problem Richard alluded to. You would have to reboot the NFS client to clear its read cache, then try gunzip again. This is probably good enough in practice, but it sounds like there's room for improvement. It's irrelevant in this scenario that the lower ZFS has ``redundancy''. All you have to do to fix the problem is resend the read over the network. What would be nice to have, that we don't have, is a way of keeping ZFS block checksums attached to the data as it travels over the network until it reaches the something-like-an-NFS-client. Each part of the stack that caches data could be trained to either (1) validate ZFS block checksums, or (2) to obey ``read no-cache'' commands passed down from the layer above. In the application-level gzip example, gzip has no way of doing (2), so extending the protection domain upward rather than pushing cache-flushing obedience downward seems more practical. For writing, application-level checksums do NOT work at all, because you would write corrupt data to the disk, and notice only later when you read it back, when there's nothing you can do to fix it. ZFS redundancy will not help you here either, because you write corrupt data redundantly! With a single protection domain for writing, the write would arrive at ZFS along with a never-regenerated checksum wrapper-seal attached to it by the something-like-an-NFS-client. Just before ZFS sends the write to the disk driver, ZFS would crack the protection domain open, validate the checksum, reblock the write, and send it to disk with a new checksum. (so, ``single protection domain'' is really a single domain for reads, and two protection domains for write) If the checksum does not match, ZFS must convince the writing client to resend---in the write direction I think cached bad data will be less of a problem. I think a single protection domain, rather than the currently best-obtainable which is sliced domains where the slices butt up against each other as closely as possible, is an idea with merit. but it doesn't have anything whatsoever to do with the fact ZFS stores things redundantly on the platters. The whole thing would have just as much merit, and fix the new problem classes it addresses just as frequently!, for single-disk vdev's as for redundant vdev's. gm> Of course, if something in the data path consistently corrupts gm> the data regardless of its origin, it won't be able to correct gm> the error. TCP does this all the time, right? see, watch this: +++ATH0 :) that aside, your idea of ``the error'' seems too general, like the annoying marketing slicks with the ``healing'' and ``correcting'' stuff. stored, transmitted, cached errors are relevantly different, which also means corruption in the read and write directions are different.
pgpsQ6XuAcq9p.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss