2012-01-14 18:36, Stefan Ring wrote:
Inspired by the paper "End-to-end Data Integrity for File Systems: A
ZFS Case Study" [1], I've been thinking if it is possible to devise a way,
in which a minimal in-memory data corruption would cause massive data
loss. I could imagine a scenario where an entire directory branch
drops off the tree structure, for example. Since I know too little
about ZFS's structure, I'm also asking myself if it is possible to
make old snapshots disappear via memory corruption or lose data blocks
to leakage (not containing data, but not marked as available).

I'd appreciate it if someone with a good understanding of ZFS's
internals and principles could comment on the possibility of such
scenarios.

[1] http://www.usenix.org/event/fast10/tech/full_papers/zhang.pdf

By no means I'm an expert like ones you seek, but I'm asking similar
questions, and have more popping up ;)

I do have some reported corruptions on my non-ECC system despite
raidz2 on disk, so I have a keen interest as to how stuff works
and why it doesn't sometimes ;)

As for block leakage, according to error messages I'm seeing
now, leaked blocks are at least expected and checked for:
"allocating allocated segment" and "freeing free segment".
How my system got here - that's the puzzle...

It does seem possible that in-memory corruption of data payload
and/or checksum of a block before writing it to disk would render
it invalid on read (data doesn't match checksum, ZFS returns EIO) .
Maybe even worse if the in-memory block is corrupted before the
checksumming, and seemingly valid garbage gets stored on disk,
read afterwards, and used with blind trust.
If it is a leaf block (userdata) you just get a corrupted file.
If it is a metadata block, and if the corruption happened before
it was ditto-written to several disk locations, you're in trouble.

It is likewise possible that data in-RAM gets corrupted after
reading from disk and checksum-checking, but before using it
as a metadata block or whatever.

If you're as "lucky" as to have irrepairable (by ditto blocks)
corruption in a metadata block near the root of a tree, you
can happen to be in bad trouble.

In all these cases RAM is the SPOF (single point of failure)
so all ZFS recommendations involve using ECC systems. Alas,
even though ECC chips and chipsets are cheap nowadays, not all
architectures use them anyway (i.e. desktops, laptops, etc.),
and the tagline of running ZFS for "reliable storage on consumer
grade hardware" is poisoned by this fact. Other filesystems
obviously suffer the same from bad components, but ZFS reports
on these detected errors, and unlike other systems that let
you dismiss the errors (i.e. free all blocks and files touched
by a corrupt block, leaving you with a smaller but consistent
tree of data blocks), or don't even notice them, ZFS tends to
get really upset on many of themm and ask for recovery from
backups (as if they are 100% reliable).

I do wonder, however, if it is possible to make a software ECC
to detect-and/or-repair small memory corruptions on consumer
grade systems. And where would such part fit - in ZFS (i.e.
some ECC bits appended in every zfs_*_t structure) or in the
{Solaris} kernel for general VM management. And even then
there's a question whether this would solve more problems than
create a greater one - pose the visibility of solution and
hide problems that actually exist (because there would be
some non-ECC parts of the data path and GIGO principle can
apply at any point). In the bad case, you ECC an invalid
piece of memory, and afterwards trust it as it matches the
checksum. On the good side, there is a smaller window that
data is exposed unprotected, so statistically this solution
should help.


HTH,
//Jim Klimov
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to