Robert Milkowski wrote:
Kevin Walker wrote:
Hi all,

Just subscribed to the list after a debate on our helpdesk lead me to the posting about ZFS corruption and the need for a fsck repair tool of some kind...

Has there been any update on this?


I guess the discussion started after someone read an article on OSNEWS.

The way zfs works is that basically you get an fsck equivalent while using a pool. ZFS checks checksums for all metadata and user data while reading it. Then all metadata are using ditto blocks to provide 2 or three copies of it (totally independent from any pool redundancy) depends on type of metadata. If it is corrupted a second (or third) copy will be used so correct data is returned and a corrupted block is automatically repaired. The ability to repair a block containing a user data depends on if you have a pool configured with or without redundancy. But even if pool is non-redundant (lets say a single disk drive) zfs still will be able to detect corruption and will be able to tell you what files are affected while metadata will be correct in most cases (unless corruption is so large and not localized so it affected all copies of a block in a pool). You will be able to read all other files and other parts of the file.

So fsck actually happens while you are accessing your data and it is even better than fsck on most other filesystems as thanks to checksumming of all data and metadata zfs knows exactly when/if something is wrong and in most cases is even able to fix it on the fly. If you want to scan entire pool including all redundant copies and get them fix if something doesn't checksum then you can schedule the pool scrubbing (while your applications are still using the pool!). This will force zfs to read all blocks from all copies to be read, their checksum checked and if needed data corrected if possible and the fact reported to user. Legacy fsck is not even close to it.


I think that the perceived need for fsck for ZFS probably comes from lack of understanding how ZFS works and from some frustrated users where under a very unlikely and rare circumstances due to data corruption a user might be in a position of not being able to import the pool therefore not being able to access any data at all while a corruption might have affected only a relatively small amount of data. Most other filesystem will allow you to access most of the data after fsck in such a situation (probably with some data loss) while zfs left user with no access to data at all. In such a case the problem lies with zfs uberblock and the remedy is to revert a pool to its previous uberblock version (or even an earlier one). In almost all the cases this will render a pool importable and then the mechanisms described in the first paragraph above will kick-in. The problem is (was) that the procedure to revert a pool to one of its previous uberblock is not documented nor is automatic and is definitely far from being sys-admin friendly. But thanks to some community members (most notably mr. Victor I think) some users affected by the issue were given a hand and were able to recover most/all their data. Others were probably assisted by Sun's support service I guess.

Fortunately a much more user-friendly mechanism has been finally implemented and inegrated into Open Solaris build 126 which allows a user to import a pool and force it to on of the previous versions of its uberblock if necessary. See http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html for more details.

There is another CR (don't have its number at hand) which is about implementing a delayed re-use on just freed blocks which should allow for more data to be recovered in such a case as above. Although I'm not sure if it has been implemented yet.

IMHO with the above CR implemented, in most cases ZFS currently provides *much* better solution to random data corruption than any other filesystem+fsck in the market.

The code for the putback of 2009/479 allows reverting to an earlier uberblock AND defers the re-use of blocks for a short time to make this "rewind" safer.

-tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to