Re: [zfs-discuss] ZFS + fsck

Tim Haley Wed, 04 Nov 2009 15:57:04 -0800

Robert Milkowski wrote:

Kevin Walker wrote:
Hi all,
Just subscribed to the list after a debate on our helpdesk lead me tothe posting about ZFS corruption and the need for a fsck repair toolof some kind...
Has there been any update on this?
I guess the discussion started after someone read an article on OSNEWS.
The way zfs works is that basically you get an fsck equivalent whileusing a pool.ZFS checks checksums for all metadata and user data while reading it.Then all metadata are using ditto blocks to provide 2 or three copies ofit (totally independent from any pool redundancy) depends on type ofmetadata. If it is corrupted a second (or third) copy will be used socorrect data is returned and a corrupted block is automaticallyrepaired. The ability to repair a block containing a user data dependson if you have a pool configured with or without redundancy. But even ifpool is non-redundant (lets say a single disk drive) zfs still will beable to detect corruption and will be able to tell you what files areaffected while metadata will be correct in most cases (unless corruptionis so large and not localized so it affected all copies of a block in apool). You will be able to read all other files and other parts of thefile.
So fsck actually happens while you are accessing your data and it iseven better than fsck on most other filesystems as thanks tochecksumming of all data and metadata zfs knows exactly when/ifsomething is wrong and in most cases is even able to fix it on the fly.If you want to scan entire pool including all redundant copies and getthem fix if something doesn't checksum then you can schedule the poolscrubbing (while your applications are still using the pool!). This willforce zfs to read all blocks from all copies to be read, their checksumchecked and if needed data corrected if possible and the fact reportedto user. Legacy fsck is not even close to it.
I think that the perceived need for fsck for ZFS probably comes fromlack of understanding how ZFS works and from some frustrated users whereunder a very unlikely and rare circumstances due to data corruption auser might be in a position of not being able to import the pooltherefore not being able to access any data at all while a corruptionmight have affected only a relatively small amount of data. Most otherfilesystem will allow you to access most of the data after fsck in sucha situation (probably with some data loss) while zfs left user with noaccess to data at all. In such a case the problem lies with zfsuberblock and the remedy is to revert a pool to its previous uberblockversion (or even an earlier one). In almost all the cases this willrender a pool importable and then the mechanisms described in the firstparagraph above will kick-in. The problem is (was) that the procedure torevert a pool to one of its previous uberblock is not documented nor isautomatic and is definitely far from being sys-admin friendly. Butthanks to some community members (most notably mr. Victor I think) someusers affected by the issue were given a hand and were able to recovermost/all their data. Others were probably assisted by Sun's supportservice I guess.
Fortunately a much more user-friendly mechanism has been finallyimplemented and inegrated into Open Solaris build 126 which allows auser to import a pool and force it to on of the previous versions of itsuberblock if necessary. Seehttp://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.htmlfor more details.
There is another CR (don't have its number at hand) which is aboutimplementing a delayed re-use on just freed blocks which should allowfor more data to be recovered in such a case as above. Although I'm notsure if it has been implemented yet.
IMHO with the above CR implemented, in most cases ZFS currently provides*much* better solution to random data corruption than any otherfilesystem+fsck in the market.

The code for the putback of 2009/479 allows reverting to an earlier uberblockAND defers the re-use of blocks for a short time to make this "rewind" safer.


-tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + fsck

Reply via email to