so, I'm playing around with dedup, and trying to get it set up how I want, with 
little impact on performance (we're using zfs primarily for storage of backups, 
using rsync to copy the files from our linux servers to our opensolaris/zfs 
'backupbricks)

currently running snv_133 on x86, zpool version 22, zfs version 4

first question:
I can't seem to get fletcher4,verify enabled for the dedup property on my 
filesystems, whenever I try, I get this:
# zfs set dedup=fletcher4,verify raid3153
cannot set property for 'raid3153': 'dedup' must be one of 'on | off | verify | 
sha256[,verify]'

since sha256,verify is the same as just verify (at least for now, and having it 
set explicitly is a Good Thing, imo) I don't see anything about fletcher4 in 
there.  I saw it had been disabled at one point because of the endianness 
issue, but it's been re-enabled yes?

second question: (2 parter, actually) with regard to verify, I notice that how 
it works is that it'll compare the checksums, and if it finds a match it then 
goes and does a direct byte comparison of the data.

first part: Would it not be more efficient to simply store 2 different 
checksums (both can be fast, so long as they're different algorithms) and 
compare the one checksum and then the other?  Seems like the likelyhood of two 
colliding sets of data having the same checksum under two algorithms is 
astronomically high (even more so than what is already there) without having 
the penalty of having to re-read the entire block from disk.  Maybe it's a 
trade-off of increased memory usage vs increased reads?  Memory is cheap, disk 
bandwidth is not, so maybe having the option of a larger DDT with this feature 
would be useful?

second part: is there any way to enable some sort of logging when a checksum 
collision is noticed when verify is on?  The reason I ask this is because we 
have well over 2PB of data we're backing up to our zfs storage and will have 
much more in the future, so if anyone is going to get a collision, it's 
probably going to be us.  However, if the empirical collision rate is 
incredibly low, or non-existant, and I can confirm that with some sort of alert 
when one occurs, I could justify turning verify off on my dedup settings.

third question:
is there a way to look at the current size of the DDT as it exists in memory?  
I need to know if I need more ram in my systems to support dedup.  We currently 
have 32GB of ram in some systems, and 16GB in others (the ones based on our old 
x4500s) with each system housing about 100TB of data.

Thanks for everything, folks, zfs is awesome, and the new dedup and userquota 
features are ones I am eagerly looking forward to implementing in our setup, 
keep up the good work!

-Jeremy
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to