[EMAIL PROTECTED] wrote:>> -=dave wrote:> > one other thing... the checksums 
for all files to send *could* be checked first in batch and known unique blocks 
prioritized and sent first, then the possibly duplicative data sent afterwards 
to be verified a dupe, thereby decreasing the possible data loss for the backup 
window to levels equivolently low to the checksum collision probability.>> ZFS 
doesn't checksum files. It checksums blocks. There has> been occasional 
discussion on the problems with checksumming> at the file level, if you check 
the archives.That was a mistake due to my haste and was meant to be "...blocks 
to send..." as used correctly later in the same sentence.> I'm not very 
familiar with what vendors are claiming for de-> duplication. Are most 
implementations at the file level?Yes, most implementations I've evaluated are 
at the file level although some products typically use block-level, most 
notably VTL backup appliance vendors.  I would suspect block-level would yield 
a marginal percentage increase in dedupe vs file level due solely to the high 
occurrence rate of the 0x00-filled block within many files as I don't think ZFS 
treats zero-filled blocks as automatically sparse? as for their sales pitch 
"deduplication" claims, they are quite outlandish on the order of 15:1.  Of 
course based on the best-case scenario such as all files from 2000 machines 
backed up to a single appliance nightly but again, such a scenario *could* 
yield such huge space savings.  i know that sun axed it's OS VTL project but 
someone else may pickup the torch.   There is an interesting fledgling VTL 
offering for Linux at http://markh794.googlepages.com  but then i digress...
I looked through the archives and found a similar discussion to this thread 
http://www.opensolaris.org/jive/thread.jspa?messageID=84033 with good (and even 
some duplicative ;) implementation logistics discussion.  A question of Torrey 
McMahon that went unanswered in the thread was:
 
Is Honeycomb doing anything in this space?
-=dave 
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to