On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble <[EMAIL PROTECTED]> wrote: > No, you are right to be concerned over block-level dedup seriously > impacting seeks. The problem is that, given many common storage > scenarios, you will have not just similar files, but multiple common > sections of many files. Things such as the various standard > productivity app documents will not just have the same header sections, > but internally, there will be significant duplications of considerable > length with other documents from the same application. Your 5MB Word > file is thus likely to share several (actually, many) multi-kB segments > with other Word files. You will thus end up seeking all over the disk > to read _most_ Word files. Which really sucks. I can list at least a > couple more common scenarios where dedup has to potential to save at > least some reasonable amount of space, yet will absolutely kill performance.
This would actually argue in favor of dedup... If the blocks are common they are more likely to be in the ARC with dedup, thus avoiding a read altogether. There would likely be greater overhead in assembling smaller packets Here's some real life... I have 442 Word documents created by me and others over several years. Many were created from the same corporate templates. I generated the MD5 hash of every 8 KB of each file and came up with a total of 8409 hash - implying 65 MB of word documents. Taking those hashes through "sort | uniq -c | sort -n" led to the following: 3 p9I7HgbxFme7TlPZmsD6/Q 3 sKE3RBwZt8A6uz+tAihMDA 3 uA4PK1+SQqD+h1Nv6vJ6fQ 3 wQoU2g7f+dxaBMzY5rVE5Q 3 yM0csnXKtRxjpSxg1Zma0g 3 yyokNamrTcD7lQiitcVgqA 4 jdsZZfIHtshYZiexfX3bQw 17 pohs0DWPFwF8HJ8p/HnFKw 19 s0eKyh/vT1LothTvsqtZOw 64 CCn3F0CqsauYsz6uId7hIg Note that "CCn3F0CqsauYsz6uId7hIg" is the MD5 hash of 8 KB of zeros. If compression is used as well, this block would not even be stored. If 512 byte blocks are used, the story is a bit different: 81 DEf6rofNmnr1g5f7oaV75w 109 3gP+ZaZ2XKqMkTQ6zGLP/A 121 ypk+0ryBeMVRnnjYQD2ZEA 124 HcuMdyNKV7FDYcPqvb2o3Q 371 s0eKyh/vT1LothTvsqtZOw 372 ozgGMCCoc+0/RFbFDO8MsQ 8535 v2GerAzfP2jUluqTRBN+iw As you might guess, that most common hash is a block of zeros. Most likely, however, these files will end up using 128K blocks for the first part of the file, smaller for the portions that don't fit. When I look at just 128K... 1 znJqBX8RtPrAOV2I6b5Wew 2 6tuJccWHGVwv3v4nee6B9w 2 Qr//PMqqhMtuKfgKhUIWVA 2 idX0awfYjjFmwHwi60MAxg 2 s0eKyh/vT1LothTvsqtZOw 3 +Q/cXnknPr/uUCARsaSIGw 3 /kyIGuWnPH/dC5ETtMqqLw 3 4G/QmksvChYvfhAX+rfgzg 3 SCMoKuvPepBdQEBVrTccvA 3 vbaNWd5IQvsGdQ9R8dIqhw There is actually very little duplication in word files. Many of the dupes above are from various revisions of the same files. > Dedup Advantages: > > (1) save space relative to the amount of duplication. this is highly > dependent on workload, and ranges from 0% to 99%, but the distribution > of possibilities isn't a bell curve (i.e. the average space saved isn't > 50%). I have evidence that shows 75% duplicate data on (mostly sparse) zone roots created and maintained over a 18 month period. I show other evidence above that it is not nearly as good for one person's copy of word documents. I suspect that it would be different if the file system that I did this on was on a file server where all of my colleagues also stored their documents (and revisions of mine that they have reviewed). > (2) noticable write performance penalty (assuming block-level dedup on > write), with potential write cache issues. Depends on the approach taken. > (3) very significant post-write dedup time, at least on the order of > 'zfs scrub'. Also, during such a post-write scenario, it more or less > takes the zpool out of usage. The ZFS competition that has this in shipping product today does not quiesce the file system during dedup passes. > (4) If dedup is done at block level, not at file level, it kills read > performance, effectively turning all dedup'd files from sequential read > to a random read. That is, block-level dedup drastically accelerates > filesystem fragmentation. Absent data that shows this, I don't accept this claim. Arguably the blocks that are duplicate are more likely to be in cache. I think that my analysis above shows that this is not a concern for my data set. > (5) Something no one has talked about, but is of concern. By removing > duplication, you increase the likelihood that loss of the "master" > segment will corrupt many more files. Yes, ZFS has self-healing and > such. But, particularly in the case where there is no ZFS pool > redundancy (or pool-level redundancy has been compromised), loss of one > block can thus be many more times severe. I believe this is true and likely a good topic for discussion. > We need to think long and hard about what the real widespread benefits > are of dedup before committing to a filesystem-level solution, rather > than an application-level one. In particular, we need some real-world > data on the actual level of duplication under a wide variety of > circumstances. The key thing here is that distributed applications will not play nicely. In my best use case, Solaris zones and LDoms are the "application". I don't expect or want Solaris to form some sort of P2P storage system across my data center to save a few terabytes. D12n at the storage device can do this much more reliably with less complexity. -- Mike Gerdts http://mgerdts.blogspot.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss