On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble <[EMAIL PROTECTED]> wrote:
> No, you are right to be concerned over block-level dedup seriously
> impacting seeks.  The problem is that, given many common storage
> scenarios, you will have not just similar files, but multiple common
> sections of many files.  Things such as the various standard
> productivity app documents will not just have the same header sections,
> but internally, there will be significant duplications of considerable
> length with other documents from the same application.  Your 5MB Word
> file is thus likely to share several (actually, many) multi-kB segments
> with other Word files.  You will thus end up seeking all over the disk
> to read _most_ Word files.  Which really sucks.  I can list at least a
> couple more common scenarios where dedup has to potential to save at
> least some reasonable amount of space, yet will absolutely kill performance.

This would actually argue in favor of dedup... If the blocks are
common they are more likely to be in the ARC with dedup, thus avoiding
a read altogether.  There would likely be greater overhead in
assembling smaller packets

Here's some real life...

I have 442 Word documents created by me and others over several years.
 Many were created from the same corporate templates.  I generated the
MD5 hash of every 8 KB of each file and came up with a total of 8409
hash - implying 65 MB of word documents.  Taking those hashes through
"sort | uniq -c | sort -n" led to the following:

      3 p9I7HgbxFme7TlPZmsD6/Q
      3 sKE3RBwZt8A6uz+tAihMDA
      3 uA4PK1+SQqD+h1Nv6vJ6fQ
      3 wQoU2g7f+dxaBMzY5rVE5Q
      3 yM0csnXKtRxjpSxg1Zma0g
      3 yyokNamrTcD7lQiitcVgqA
      4 jdsZZfIHtshYZiexfX3bQw
     17 pohs0DWPFwF8HJ8p/HnFKw
     19 s0eKyh/vT1LothTvsqtZOw
     64 CCn3F0CqsauYsz6uId7hIg

Note that "CCn3F0CqsauYsz6uId7hIg" is the MD5 hash of 8 KB of zeros.
If compression is used as well, this block would not even be stored.

If 512 byte blocks are used, the story is a bit different:

     81 DEf6rofNmnr1g5f7oaV75w
    109 3gP+ZaZ2XKqMkTQ6zGLP/A
    121 ypk+0ryBeMVRnnjYQD2ZEA
    124 HcuMdyNKV7FDYcPqvb2o3Q
    371 s0eKyh/vT1LothTvsqtZOw
    372 ozgGMCCoc+0/RFbFDO8MsQ
   8535 v2GerAzfP2jUluqTRBN+iw

As you might guess, that most common hash is a block of zeros.

Most likely, however, these files will end up using 128K blocks for
the first part of the file, smaller for the portions that don't fit.
When I look at just 128K...

      1 znJqBX8RtPrAOV2I6b5Wew
      2 6tuJccWHGVwv3v4nee6B9w
      2 Qr//PMqqhMtuKfgKhUIWVA
      2 idX0awfYjjFmwHwi60MAxg
      2 s0eKyh/vT1LothTvsqtZOw
      3 +Q/cXnknPr/uUCARsaSIGw
      3 /kyIGuWnPH/dC5ETtMqqLw
      3 4G/QmksvChYvfhAX+rfgzg
      3 SCMoKuvPepBdQEBVrTccvA
      3 vbaNWd5IQvsGdQ9R8dIqhw

There is actually very little duplication in word files.  Many of the
dupes above are from various revisions of the same files.

> Dedup Advantages:
>
> (1)  save space relative to the amount of duplication.  this is highly
> dependent on workload, and ranges from 0% to 99%, but the distribution
> of possibilities isn't a bell curve (i.e. the average space saved isn't
> 50%).

I have evidence that shows 75% duplicate data on (mostly sparse) zone
roots created and maintained over a 18 month period.  I show other
evidence above that it is not nearly as good for one person's copy of
word documents.  I suspect that it would be different if the file
system that I did this on was on a file server where all of my
colleagues also stored their documents (and revisions of mine that
they have reviewed).

> (2)  noticable write performance penalty (assuming block-level dedup on
> write), with potential write cache issues.

Depends on the approach taken.

> (3)  very significant post-write dedup time, at least on the order of
> 'zfs scrub'. Also, during such a post-write scenario, it more or less
> takes the zpool out of usage.

The ZFS competition that has this in shipping product today does not
quiesce the file system during dedup passes.

> (4) If dedup is done at block level, not at file level, it kills read
> performance, effectively turning all dedup'd files from sequential read
> to a random read.  That is, block-level dedup drastically accelerates
> filesystem fragmentation.

Absent data that shows this, I don't accept this claim.  Arguably the
blocks that are duplicate are more likely to be in cache.  I think
that my analysis above shows that this is not a concern for my data
set.

> (5)  Something no one has talked about, but is of concern. By removing
> duplication, you increase the likelihood that loss of the "master"
> segment will corrupt many more files. Yes, ZFS has self-healing and
> such.  But, particularly in the case where there is no ZFS pool
> redundancy (or pool-level redundancy has been compromised), loss of one
> block can thus be many more times severe.

I believe this is true and likely a good topic for discussion.

> We need to think long and hard about what the real widespread benefits
> are of dedup before committing to a filesystem-level solution, rather
> than an application-level one.  In particular, we need some real-world
> data on the actual level of duplication under a wide variety of
> circumstances.

The key thing here is that distributed applications will not play
nicely.  In my best use case, Solaris zones and LDoms are the
"application".  I don't expect or want Solaris to form some sort of
P2P storage system across my data center to save a few terabytes.
D12n at the storage device can do this much more reliably with less
complexity.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to