Anton B. Rang wrote: >> From the ZFS Administration Guide, Chapter 11, Data Repair section: >> Given that the fsck utility is designed to repair known pathologies >> specific to individual file systems, writing such a utility for a file >> system with no known pathologies is impossible. >> > > That's a fallacy (and is incorrect even for the UFS fsck; refer to the > McKusick/Kowalski paper and the distinction they make between 'expected' > corruptions and other inconsistencies). > > First, there are two types of utilities which might be useful in the > situation where a ZFS pool has become corrupted. The first is a file system > checking utility (call it zfsck); the second is a data recovery utility. The > difference between those is that the first tries to bring the pool (or file > system) back to a usable state, while the second simply tries to recover the > files to a new location. >
Hi Anton, How would you describe the difference between the file system checking utility and zpool scrub? Is zpool scrub lacking in its verification of the data? How would you describe the difference between the data recovery utility and ZFS's normal data recovery process? > What does a file system check do? It verifies that a file system is > internally consistent, and makes it consistent if it is not. If ZFS were > always consistent on disk, then only a verification would be needed. Since > we have evidence that it is not always consistent in the face of hardware > failures, at least, repair may also be needed. This doesn't need to be that > hard. For instance, the space maps can be reconstructed by walking the > various block trees; the uberblock effectively has several backups (though it > might be better in some cases if an older backup were retained); and the ZFS > checksums make it easy to identify block types and detect bad pointers. Files > can be marked as damaged if they contain pointers to bad data; directories > can be repaired if their hash structures are damaged (as long as the names > and pointers can be salvaged); etc. Much more complex file systems than ZFS > have file system checking utilities, because journaling, COW, etc. don't help > you in ! the > face of software bugs or certain classes of hardware failures. > > A recovery tool is even simpler, because all it needs to do is find a tree > root and then walk the file system, discovering directories and files, > verifying that each of them is readable by using the checksums to check > intermediate and leaf blocks, and extracting the data. The tricky bit with > ZFS is simply identifying a relatively new root, so that the newest copy of > the data can be identified. > > Almost every file system starts out without an fsck utility, and implements > one once it becomes obvious that "sorry, you have to reinitialize the file > system" -- or worse, "sorry, we lost all of your data" -- is unacceptable to > a certain proportion of customers. > > Nobody thinks that an answer of "sorry, we lost all of your data" is acceptable. However, there are failures which will result in loss of data no matter how clever the file system is. But people will still believe their hardware is infallible and refuse to configure ZFS to be able to repair their data. You can only push a rope so far... -- richard _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss