I'm pushing the send button too often, but yes, considering what said before, byte-to-byte comparison should be mandatory when deduplicating, and therefore a "lighter" hash or checksum algorithm, would suffice to reduce the number of dedup candidates. And overall deduping would be "bulletproof" and faster.
On Wed, Jul 11, 2012 at 10:50 AM, Ferenc-Levente Juhos <feci1...@gmail.com>wrote: > Actually although as you pointed out that the chances to have an sha256 > collision is minimal, but still it can happen, that would mean > that the dedup algorithm discards a block that he thinks is a duplicate. > Probably it's anyway better to do a byte to byte comparison > if the hashes match to be sure that the blocks are really identical. > > The funny thing here is that ZFS tries to solve all sorts of data > integrity issues with checksumming and healing, etc., > and on the other hand a hash collision in the dedup algorithm can cause > loss of data if wrongly configured. > > Anyway thanks that you have brought up the subject, now I know if I will > enable the dedup feature I must set it to sha256,verify. > On Wed, Jul 11, 2012 at 10:41 AM, Ferenc-Levente Juhos < > feci1...@gmail.com> wrote: > >> I was under the impression that the hash (or checksum) used for data >> integrity is the same as the one used for deduplication, >> but now I see that they are different. >> >> >> On Wed, Jul 11, 2012 at 10:23 AM, Sašo Kiselkov >> <skiselkov...@gmail.com>wrote: >> >>> On 07/11/2012 09:58 AM, Ferenc-Levente Juhos wrote: >>> > Hello all, >>> > >>> > what about the fletcher2 and fletcher4 algorithms? According to the >>> zfs man >>> > page on oracle, fletcher4 is the current default. >>> > Shouldn't the fletcher algorithms be much faster then any of the SHA >>> > algorithms? >>> > On Wed, Jul 11, 2012 at 9:19 AM, Sašo Kiselkov <skiselkov...@gmail.com >>> >wrote: >>> >>> Fletcher is a checksum, not a hash. It can and often will produce >>> collisions, so you need to set your dedup to verify (do a bit-by-bit >>> comparison prior to deduplication) which can result in significant write >>> amplification (every write is turned into a read and potentially another >>> write in case verify finds the blocks are different). With hashes, you >>> can leave verify off, since hashes are extremely unlikely (~10^-77) to >>> produce collisions. >>> >>> -- >>> Saso >>> >> >> >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss