Eric Schrock wrote: > On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote: >> - "ZFS does not need fsck". >> Ok, that?s a great statement, but i think ZFS needs one. Really does. >> And in my opinion a enhanced zdb would be the solution. Flexibility. >> Options. > > About 99% of the problems reported as "I need ZFS fsck" can be summed up > by two ZFS bugs: > > 1. If a toplevel vdev fails to open, we should be able to pull > information from necessary ditto blocks to open the pool and make > what progress we can. Right now, the root vdev code assumes "can't > open = faulted pool," which results in failure scenarios that are > perfectly recoverable most of the time. This needs to be fixed > so that pool failure is only determined by the ability to read > critical metadata (such as the root of the DSL). > > 2. If an uberblock ends up with an inconsistent view of the world (due > to failure of DKIOCFLUSHWRITECACHE, for example), we should be able > to go back to previous uberblocks to find a good view of our pool. > This is the failure mode described by Jeff.
I've mostly seen (2), because despite all the best practices out there, single vdev pools are quite common. In all such cases that I had my hands on it was possible to recover pool by going back by one or two txgs. > These are both bugs in ZFS and will be fixed. The other 1% of the > complaints are usually of the form "I created my pool on top of my old > one" or "I imported a LUN on two different systems at the same time". Of these two former is not easy because it requires searching through the entire disk space for root block candidates and trying each of them. Latter one is not catastrophic in case there were little to no activity from one system. In this case one of the first things to suffer is pool config object, and corruption of it prevents pool open. Fortunately enough, after putback of 6733970 assertion failure in dbuf_dirty() via spa_sync_nvlist() in build 99 corrupted pool config object is written in such a way during open that prevents reading in old corrupted copy, and in most cases this allows to import pool and save most of the data. zdb is useful to understand how much is corrupted and how much is recovered. If nothing else is corrupted, then pool may be available for further use without recreation. Again, in every case I had my hands on it was possible to either recover pool completely or at least save most of the data. > It's unclear what a 'fsck' tool could do in this scenario, if anything. > Due to a variety of reasons (hierarchical nature of ZFS, variable block > sizes, RAIDZ-Z, compression, etc), it's difficult to even *identify* a > ZFS block, let alone determine its validity and associate it in some > larger construct. Indeed. In "more ZFS recovery" case involving 42TB pool with about 8TB used, zdb -bv alone took several hours to walk the block tree and verify consistency of block pointers, and zdb -bcv took couple of days to verify all user data blocks as well. And different checksums and gang blocks in addition to all other dynamic features mentioned complicate the task of identifying ZFS blocks and linking those blocks into tree and make it really time (and space) consuming. > There are some interesting possibilities for limited forensic tools - in > particular, I like the idea of a mdb backend for reading and writing ZFS > pools[1]. But I haven't actually heard a reasonable proposal for what a > fsck-like tool (i.e. one that could "repair" things automatically) would > actually *do*, let alone how it would work in the variety of situations > it needs to (compressed RAID-Z?) where the standard ZFS infrastructure > fails. There are a number of bugs and rfes to improve usefulness of zdb for field use, e.g. 6720637 want zdb -l option to dump uberblock arrays as well 6709782 issues running zdb with -p and -e options 6736356 zdb -R needs to work with exported pools 6720907 zdb should handle errors while dumping datasets and objects 6746101 zdb command to search for ZFS labels in a device 6757444 want zdb -R to supoprt decompression, checksumming and raid-z 6757430 want an option for zdb to disable space map loading and leak tracking Hth, Victor > - Eric > > [1] > http://mbruning.blogspot.com/2008/08/recovering-removed-file-on-zfs-disk.html > > -- > Eric Schrock, Fishworks http://blogs.sun.com/eschrock > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss