> From the ZFS Administration Guide, Chapter 11, Data Repair section: > Given that the fsck utility is designed to repair known pathologies > specific to individual file systems, writing such a utility for a file > system with no known pathologies is impossible.
That's a fallacy (and is incorrect even for the UFS fsck; refer to the McKusick/Kowalski paper and the distinction they make between 'expected' corruptions and other inconsistencies). First, there are two types of utilities which might be useful in the situation where a ZFS pool has become corrupted. The first is a file system checking utility (call it zfsck); the second is a data recovery utility. The difference between those is that the first tries to bring the pool (or file system) back to a usable state, while the second simply tries to recover the files to a new location. What does a file system check do? It verifies that a file system is internally consistent, and makes it consistent if it is not. If ZFS were always consistent on disk, then only a verification would be needed. Since we have evidence that it is not always consistent in the face of hardware failures, at least, repair may also be needed. This doesn't need to be that hard. For instance, the space maps can be reconstructed by walking the various block trees; the uberblock effectively has several backups (though it might be better in some cases if an older backup were retained); and the ZFS checksums make it easy to identify block types and detect bad pointers. Files can be marked as damaged if they contain pointers to bad data; directories can be repaired if their hash structures are damaged (as long as the names and pointers can be salvaged); etc. Much more complex file systems than ZFS have file system checking utilities, because journaling, COW, etc. don't help you in the face of software bugs or certain classes of hardware failures. A recovery tool is even simpler, because all it needs to do is find a tree root and then walk the file system, discovering directories and files, verifying that each of them is readable by using the checksums to check intermediate and leaf blocks, and extracting the data. The tricky bit with ZFS is simply identifying a relatively new root, so that the newest copy of the data can be identified. Almost every file system starts out without an fsck utility, and implements one once it becomes obvious that "sorry, you have to reinitialize the file system" -- or worse, "sorry, we lost all of your data" -- is unacceptable to a certain proportion of customers. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss