[EMAIL PROTECTED] wrote:>> -=dave wrote:> > one other thing... the checksums for all files to send *could* be checked first in batch and known unique blocks prioritized and sent first, then the possibly duplicative data sent afterwards to be verified a dupe, thereby decreasing the possible data loss for the backup window to levels equivolently low to the checksum collision probability.>> ZFS doesn't checksum files. It checksums blocks. There has> been occasional discussion on the problems with checksumming> at the file level, if you check the archives.That was a mistake due to my haste and was meant to be "...blocks to send..." as used correctly later in the same sentence.> I'm not very familiar with what vendors are claiming for de-> duplication. Are most implementations at the file level?Yes, most implementations I've evaluated are at the file level although some products typically use block-level, most notably VTL backup appliance vendors. I would suspect block-level would yield a marginal percentage increase in dedupe vs file level due solely to the high occurrence rate of the 0x00-filled block within many files as I don't think ZFS treats zero-filled blocks as automatically sparse? as for their sales pitch "deduplication" claims, they are quite outlandish on the order of 15:1. Of course based on the best-case scenario such as all files from 2000 machines backed up to a single appliance nightly but again, such a scenario *could* yield such huge space savings. i know that sun axed it's OS VTL project but someone else may pickup the torch. There is an interesting fledgling VTL offering for Linux at http://markh794.googlepages.com but then i digress... I looked through the archives and found a similar discussion to this thread http://www.opensolaris.org/jive/thread.jspa?messageID=84033 with good (and even some duplicative ;) implementation logistics discussion. A question of Torrey McMahon that went unanswered in the thread was: Is Honeycomb doing anything in this space? -=dave
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss