Robert Milkowski wrote:
Kevin Walker wrote:
Hi all,
Just subscribed to the list after a debate on our helpdesk lead me to
the posting about ZFS corruption and the need for a fsck repair tool
of some kind...
Has there been any update on this?
I guess the discussion started after someone read an article on OSNEWS.
The way zfs works is that basically you get an fsck equivalent while
using a pool.
ZFS checks checksums for all metadata and user data while reading it.
Then all metadata are using ditto blocks to provide 2 or three copies of
it (totally independent from any pool redundancy) depends on type of
metadata. If it is corrupted a second (or third) copy will be used so
correct data is returned and a corrupted block is automatically
repaired. The ability to repair a block containing a user data depends
on if you have a pool configured with or without redundancy. But even if
pool is non-redundant (lets say a single disk drive) zfs still will be
able to detect corruption and will be able to tell you what files are
affected while metadata will be correct in most cases (unless corruption
is so large and not localized so it affected all copies of a block in a
pool). You will be able to read all other files and other parts of the
file.
So fsck actually happens while you are accessing your data and it is
even better than fsck on most other filesystems as thanks to
checksumming of all data and metadata zfs knows exactly when/if
something is wrong and in most cases is even able to fix it on the fly.
If you want to scan entire pool including all redundant copies and get
them fix if something doesn't checksum then you can schedule the pool
scrubbing (while your applications are still using the pool!). This will
force zfs to read all blocks from all copies to be read, their checksum
checked and if needed data corrected if possible and the fact reported
to user. Legacy fsck is not even close to it.
I think that the perceived need for fsck for ZFS probably comes from
lack of understanding how ZFS works and from some frustrated users where
under a very unlikely and rare circumstances due to data corruption a
user might be in a position of not being able to import the pool
therefore not being able to access any data at all while a corruption
might have affected only a relatively small amount of data. Most other
filesystem will allow you to access most of the data after fsck in such
a situation (probably with some data loss) while zfs left user with no
access to data at all. In such a case the problem lies with zfs
uberblock and the remedy is to revert a pool to its previous uberblock
version (or even an earlier one). In almost all the cases this will
render a pool importable and then the mechanisms described in the first
paragraph above will kick-in. The problem is (was) that the procedure to
revert a pool to one of its previous uberblock is not documented nor is
automatic and is definitely far from being sys-admin friendly. But
thanks to some community members (most notably mr. Victor I think) some
users affected by the issue were given a hand and were able to recover
most/all their data. Others were probably assisted by Sun's support
service I guess.
Fortunately a much more user-friendly mechanism has been finally
implemented and inegrated into Open Solaris build 126 which allows a
user to import a pool and force it to on of the previous versions of its
uberblock if necessary. See
http://c0t0d0s0.org/archives/6067-PSARC-2009479-zpool-recovery-support.html
for more details.
There is another CR (don't have its number at hand) which is about
implementing a delayed re-use on just freed blocks which should allow
for more data to be recovered in such a case as above. Although I'm not
sure if it has been implemented yet.
IMHO with the above CR implemented, in most cases ZFS currently provides
*much* better solution to random data corruption than any other
filesystem+fsck in the market.
The code for the putback of 2009/479 allows reverting to an earlier uberblock
AND defers the re-use of blocks for a short time to make this "rewind" safer.
-tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss