Frank Middleton <f.middle...@apogeect.com> writes:

> Exactly. My whole point. And without ECC there's no way of knowing.
> But if the data is damaged /after/ checksum but /before/ write, then
> you have a real problem...

we can't do much to protect ourselves from damage to the data itself
(an extra copy in RAM will help little and ruin performance).

damages to the bits holding the computed checksum before it is written
can be alleviated by doing the calculation independently for each
written copy.  in particular, this will help if the bit error is
transient.

since the number of octets in RAM holding the checksum dwarves the
number of octets occupied by data by a large ratio (256 bits vs. one
mebibit for a full default sized record), such a paranoia mode will
most likely tell you that the *data* is corrupt, not the checksum.
but today you don't know, so it's an improvement in my book.

> Quoting the ZFS admin guide: "The failmode property ... provides the
> failmode property for determining the behavior of a catastrophic
> pool failure due to a loss of device connectivity or the failure of
> all devices in the pool. ". Has this changed since the ZFS admin
> guide was last updated?  If not, it doesn't seem relevant.

I guess checksum error handling is orthogonal to this and should have
its own property.  it sure would be nice if the admin could ask the OS
to deliver the bits contained in a file, no matter what, and just log
the problem.

> Cheers -- Frank

thank you for pointing out this potential weakness in ZFS' consistency
checking, I didn't realise it was there.

also thank you, all ZFS developers, for your great job :-)

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to