Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Jim Klimov Thu, 12 Jan 2012 18:27:44 -0800

2012-01-13 5:30, Daniel Carosone wrote:

On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote:

This thread is about checksums - namely, now, what are
our options when they mismatch the data? As has been
reported by many blog-posts researching ZDB, there do
happen cases when checksums are broken (i.e. bitrot in
block pointers, or rather in RAM while the checksum was
calculated - so each ditto copy of BP has the error),
but the file data is in fact intact (extracted from
disk with ZDB or DD, and compared to other copies).


Metadata is at least doubly redundant and checksummed.


The implication is that the original calculation of the checksum was
bad in ram (undetected due to lack of ECC), and then written out
redundantly and fed as bad input to the rest of the merkle construct.
The data blocks on disk are correct, but they fail to verify against
the bad metadata.


Implication is correct, that was the outlined scenario :)

The complaint appears to be that ZFS makes this 'worse' because the
(independently verified) valid data blocks are inaccessible.


Also correct, a frequent "woe" (generally in the context
of discussions about lack of ZFS fsck, though many of these
discussions tend to descend into flame wars and.or detailed
descriptions of how the COW and transaction engine keep
{meta}data intact - just until some such fatal bit rot that
the pool must be recreated as the only "recovery" option).

Worse than what?


Worse than not having a (relatively easy-to-use) ability
to confirm to the system, which part to trust - data or the
checksum (which returns us to the subject of automating this
with ECC and/or other checksums). My data, my checks into it,
my word should be final in case of dispute ;)

>  Corrupted file data that is then accurately

checksummed and readable as valid? Accurate data that is read without
any assertion of validity, in a traditional filesystem?


If by ZFS automata itself - without my ability to intervene -
then probably not. It would make ZFS no better than others.

> There's

an inherent value judgement here that will vary by judge, but in each
case it's as much a judgement on the value of ECC and reliable
hardware, and your data and time enacting various kinds of recovery,
as it is the value of ZFS.


Perhaps so. I might read through a text file to see if it
is garbage or text. I might parse or display image files
and many other formats. I might compare to another copy, if
available. I just don't have a mechanism to do so with ZFS.

Apparently, a view into the data "as it seems to be" without
checksums would speed up the process of data comparison,
eye-reading and other methods of validation.

People do that with LOST+FOUND and such directories
on other FSes, but usually after an unreversible attempt
of recovery, correct or not...

Heck, with ZFS I might have a snapshot-like view at my
recovery options (accessible to programs like image viewers)
without changing on-disk data until I pick a variant.

Yes, okay, ZFS did inform me of some inconsistency
(even then it is not necessarily the data that is bad)
and perhaps prompted me to fix the hardware and find
other copies of data. Kudos to the team, really!
But then it stops here, without providing me with
options ro recover whatever is on disk (at my risk).

As a Solaris example, admins are allowed to confirm
which part of a broken USF+SVM mirror to trust, even
if there is not a quorum set of metadb replicas.

This trust in the human is common in the industry, and
allows to account for whatever could not be done in the
software as a one-size-fits-all solution. Also it is
the user's final chioce to kill or save the data, not
the programmers with whatever cryptic intentions he had.


The same circumstance could, in principle, happen due to bad CPU even
with ECC.  In either case, the value of ZFS includes that an error has
been detected you would otherwise have been unaware of, and you get a
clue that you need to fix hardware and spend time.


True, whenever that is possible.
Hardware will be faulty, always. We can only decrease the
extent of that. Not all implementation options (see laptops
and ECC RAM) or budgets can fix it to "reasonable" levels,
though.

Software must be the more resilient part, I guess - as long
as its error-detection algorithm can execute on that CPU... :)

//Jim




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Reply via email to