Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Jim Klimov Thu, 12 Jan 2012 14:36:14 -0800

I guess I have another practical rationale for a second
checksum, be it ECC or not: my scrubbing pool found some
"unrecoverable errors". Luckily, for those files I still
have external originals, so I rsynced them over. Still,
there is one file whose broken prehistory is referenced
in snapshots, and properly fixing that would probably
require me to resend the whole stack of snapshots.
That's uncool, but a subject for another thread.


This thread is about checksums - namely, now, what are
our options when they mismatch the data? As has been
reported by many blog-posts researching ZDB, there do
happen cases when checksums are broken (i.e. bitrot in
block pointers, or rather in RAM while the checksum was
calculated - so each ditto copy of BP has the error),
but the file data is in fact intact (extracted from
disk with ZDB or DD, and compared to other copies).

For these cases bloggers asked (in vain) - why is it
not allowed for an admin to confirm validity of end-user
data and have the system reconstruct (re-checksum) the
metadata for it?.. IMHO, that's a valid RFE.

While the system is scrubbing, I was reading up on theory.
Found a nice text "Keeping Bits Safe: How Hard Can It Be?"
by David Rosenthal [1], where I stumbled upon an interesting
thought:
  The bits forming the digest are no different from the
  bits forming the data; neither is magically incorruptible.
  ...Applications need to know whether the digest has
  been changed.

In our case, where original checksum in the blockpointer
could be corrupted in (non-ECC) RAM of my home-NAS just
before it was dittoed to disk, another checksum - copy
of this same one, or a differently calculated one, could
provide ZFS with the means to determine whether the data
or one of the checksums got corrupted (or all of them).
Of course, this is not an absolute protection method,
but it can reduce the cases where pools have to be
"destroyed, recreated and recovered from tape".

It is my belief that using dedup contributed to my issue -
there's lots more of updating the block pointers and their
checksums, so it gradually becomes more likely that the
metadata (checksum) blocks gets broken (i.e. in non-ECC
RAM), while the written-once userdata remains intact...

--
[1] http://queue.acm.org/detail.cfm?id=1866298
While the text discusses what all ZFSers mostly know
already - about bit-rot, MTTDL and such, it does so with
great detail and many examples, and gave me a better
understanding of it all even though I deal with this for
several years now. A good read, I suggest it to others ;)

//Jim Klimov
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

Reply via email to