Re: [zfs-discuss] Best way to convert checksums

Richard Elling Fri, 02 Oct 2009 17:30:37 -0700

On Oct 2, 2009, at 3:36 PM, Miles Nordin wrote:

"re" == Richard Elling <richard.ell...@gmail.com> writes:


   re> By your logic, SECDED ECC for memory is broken because it only
   re> corrects

ECC is not a checksum.

SHA-256 is not a checksum, either, but that isn't the point. Theconcern is

that corruption can be detected.  ECC has very, very limited detection
capabilities, yet it is "good enough" for many people. We know that
MOS memories have certain failure modes that cause bit flips and by
using ECC and interleaving, the dependability is improved. The big
question is, what does the corrupted data look like in storage? Random
bit flips? Big chunks of zeros? 55aa patterns? Since the concern with
the broken fletcher2 is restricted to the most significant bits, we are
most concerned with failures where the most significants are set to
ones. But as I said, we have no real idea what the corrupted data

should look like, and if it is zero-filled, then fletcher2 will catchit.

Go ahead, get out your dictionary, enter severe-pedantry-mode.  but it
is relevantly different.  In for example data transmission scenarios,
FEC's like ECC are often used along with a strong noncorrecting
checksum over a larger block.

The OP further described scenarios plausible for storage, like ``long
string of zeroes with 1 bit flipped'', that produce collisions with
the misimplemented fletcher2 (but, obviously, not with any strong
checksum like correct-fletcher2).

   re> is fletcher2 "good enough" for storage?

yes, it probably is good enough, but ZFS implements some other broken
algorithm and calls it fletcher2.  so, please stop saying fletcher2.


If I was to refer to Fletcher's algorithm, I would use Fletcher.  When I
am referring to the ZFS checksum setting of "fletcher2" I will continue
to use "fletcher2"

   re> I'll blame the lawyers. They are causing me to remove certain
   re> words from my vocabulary :-(

yeah, well, allow me to add a word back to the vocabulary: BROKEN.

If you are not legally allowed to use words like broken and working,
then find another identity from which to talk, please.

   re> Question for the zfs-discuss participants, have you seen a
   re> data corruption that was not detected when using fletcher2?

This is ridiculous.  It's not fletcher2, it's brokenfletcher2.  It's
avoidably extremely weak.  It's reasonable to want to use a real
checksum, and this PR game you are playing is frustrating and
confidence-harming for people who want that.


There is no PR campaign. It is what it is. What is done is done.

This does not have to become a big deal, unless you try to spin it
with a 7200rpm PR machine like IBM did with their broken Deathstar
drives before they became HGST.

Please, what we need to do is admit that the checksum is relevantly
broken in a way that compromises the integrity guarantees with which
ZFS was sold to many customers, fix the checksum, and learn how to
conveniently migrate our data.


Unfortunately, there is a backwards compatibility issue that
requires the current fletcher2 to live for a very long time. The
only question for debate is whether it should be the default.
To date, I see no field data that suggests it is not detecting
corruption.

Based on the table you posted, I guess file data can be set to
fletcher4 or sha256 using filesystem properties to work around the
bug on Solaris versions with the broken implementation.

1. What's needed to avoid fletcher2 on the ZIL on broken Solaris
   versions?


Please file RFEs at bugs.opensolaris.org

2. I understand the workaround, but not the fix.

   How does the fix included S10u8 and snv_114 work?  Is there a ZFS
   version bump?  Does the fix work by implementing fletcher2
   correctly?  or does it just disable fletcher2 and force everything
   to use brokenfletcher4 which is good enough?  If the former, how
   are the broken and correct versions of fletcher2
   distinguished---do they show up with different names in the pool
   properties?


The best I can tell, the comments are changed to indicate fletcher2 is
deprecated. However, it must live on (forever) because of backwards
compatibility. I presume one day the default will change to fletcher4
or something else. This is implied by zfs(1m):

     checksum=on | off | fletcher2,| fletcher4 | sha256

         Controls the checksum used to verify data integrity. The
         default  value  is  on,  which  automatically selects an
         appropriate algorithm (currently,  fletcher2,  but  this
         may  change  in future releases). The value off disables
         integrity checking on user data. Disabling checksums  is
         NOT a recommended practice.

   Once you have the fixed software, how do you make sure fixed
   checksums are actually covering data blocks originally written by
   old broken software?  I assume you have to use rsync or zfs
   send/recv to rewrite all the data with the new checksum?  If yes,
   what do you have to do before rewriting---upgrade solaris and then
   'zfs upgrade' each filesystem one by one?  Will zfs send/recv work
   across the filesystem versions, or does the copying have to be
   done with rsync?


I believe such a requirement would have a half-life of less than a
nanosecond.

3. speaking of which, what about the checksum in zfs send streams?
   is it also fletcher2, and if so was it also fixed in
   s10u8/snv_114, and how does this affect compatibility for people
   who have ignored my advice and stored streams instead of zpools?
   Will a newer 'zfs recv' always work with an older 'zfs send' but
   not the other way around?


fletcher4.  Thanks for reminding me... I'll update my slides :-)

there is basically no informaiton about implementing the fix in the
bug, and we can't write to the bug from outside Sun.  Whatever
sysadmins need to do to get their data under the strength of checksum
they thought it was under, it might be nice to describe it in the bug
for whoever gets referred to the bug and has an affected version.


UTSL

Bottom line: the checksum match does not guarantee correctness,
but a checksum mismatch does indicate differences. In general, this is
how checksums work, no?
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best way to convert checksums

Reply via email to