I'm pushing the send button too often, but yes, considering what said
before,
byte-to-byte comparison should be mandatory when deduplicating, and
therefore a "lighter" hash or checksum algorithm,
would suffice to reduce the number of dedup candidates. And overall
deduping would be "bulletproof" and faster.

On Wed, Jul 11, 2012 at 10:50 AM, Ferenc-Levente Juhos
<feci1...@gmail.com>wrote:

> Actually although as you pointed out that the chances to have an sha256
> collision is minimal, but still it can happen, that would mean
> that the dedup algorithm discards a block that he thinks is a duplicate.
> Probably it's anyway better to do a byte to byte comparison
> if the hashes match to be sure that the blocks are really identical.
>
> The funny thing here is that ZFS tries to solve all sorts of data
> integrity issues with checksumming and healing, etc.,
> and on the other hand a hash collision in the dedup algorithm can cause
> loss of data if wrongly configured.
>
> Anyway thanks that you have brought up the subject, now I know if I will
> enable the dedup feature I must set it to sha256,verify.
>  On Wed, Jul 11, 2012 at 10:41 AM, Ferenc-Levente Juhos <
> feci1...@gmail.com> wrote:
>
>> I was under the impression that the hash (or checksum) used for data
>> integrity is the same as the one used for deduplication,
>> but now I see that they are different.
>>
>>
>> On Wed, Jul 11, 2012 at 10:23 AM, Sašo Kiselkov 
>> <skiselkov...@gmail.com>wrote:
>>
>>> On 07/11/2012 09:58 AM, Ferenc-Levente Juhos wrote:
>>> > Hello all,
>>> >
>>> > what about the fletcher2 and fletcher4 algorithms? According to the
>>> zfs man
>>> > page on oracle, fletcher4 is the current default.
>>> > Shouldn't the fletcher algorithms be much faster then any of the SHA
>>> > algorithms?
>>> > On Wed, Jul 11, 2012 at 9:19 AM, Sašo Kiselkov <skiselkov...@gmail.com
>>> >wrote:
>>>
>>> Fletcher is a checksum, not a hash. It can and often will produce
>>> collisions, so you need to set your dedup to verify (do a bit-by-bit
>>> comparison prior to deduplication) which can result in significant write
>>> amplification (every write is turned into a read and potentially another
>>> write in case verify finds the blocks are different). With hashes, you
>>> can leave verify off, since hashes are extremely unlikely (~10^-77) to
>>> produce collisions.
>>>
>>> --
>>> Saso
>>>
>>
>>
>
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to