Chris Cosby wrote: > > > On Tue, Jul 22, 2008 at 11:19 AM, <[EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]>> wrote: > > [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> wrote on 07/22/2008 > 09:58:53 AM: > > > To do dedup properly, it seems like there would have to be some > > overly complicated methodology for a sort of delayed dedup of the > > data. For speed, you'd want your writes to go straight into the > > cache and get flushed out as quickly as possibly, keep everything as > > ACID as possible. Then, a dedup scrubber would take what was > > written, do the voodoo magic of checksumming the new data, scanning > > the tree to see if there are any matches, locking the duplicates, > > run the usage counters up or down for that block of data, swapping > > out inodes, and marking the duplicate data as free space. > I agree, but what you are describing is file based dedup, ZFS > already has > the groundwork for dedup in the system (block level checksuming and > pointers). > > > It's a > > lofty goal, but one that is doable. I guess this is only necessary > > if deduplication is done at the file level. If done at the block > > level, it could possibly be done on the fly, what with the already > > implemented checksumming at the block level, > > exactly -- that is why it is attractive for ZFS, so much of the > groundwork > is done and needed for the fs/pool already. > > > but then your reads > > will suffer because pieces of files can potentially be spread all > > over hell and half of Georgia on the zdevs. > > I don't know that you can make this statement without some study of an > actual implementation on real world data -- and then because it is > block > based, you should see varying degrees of this dedup-flack-frag > depending > on data/usage. > > It's just a NonScientificWAG. I agree that most of the duplicated > blocks will in most cases be part of identical files anyway, and thus > lined up exactly as you'd want them. I was just free thinking and typing. > No, you are right to be concerned over block-level dedup seriously impacting seeks. The problem is that, given many common storage scenarios, you will have not just similar files, but multiple common sections of many files. Things such as the various standard productivity app documents will not just have the same header sections, but internally, there will be significant duplications of considerable length with other documents from the same application. Your 5MB Word file is thus likely to share several (actually, many) multi-kB segments with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. I can list at least a couple more common scenarios where dedup has to potential to save at least some reasonable amount of space, yet will absolutely kill performance.
> For instance, I would imagine that in many scenarios much od the > dedup > data blocks would belong to the same or very similar files. In > this case > the blocks were written as best they could on the first write, > the deduped > blocks would point to a pretty sequential line o blocks. Now on > some files > there may be duplicate header or similar portions of data -- these may > cause you to jump around the disk; but I do not know how much this > would be > hit or impact real world usage. > > > > Deduplication is going > > to require the judicious application of hallucinogens and man hours. > > I expect that someone is up to the task. > > I would prefer the coder(s) not be seeing "pink elephants" while > writing > this, but yes it can and will be done. It (I believe) will be easier > after the grow/shrink/evac code paths are in place though. Also, the > grow/shrink/evac path allows (if it is done right) for other cool > things > like a base to build a roaming defrag that takes into account snaps, > clones, live and the like. I know that some feel that the > grow/shrink/evac > code is more important for home users, but I think that it is super > important for most of these additional features. > > The elephants are just there to keep the coders company. There are > tons of benefits for dedup, both for home and non-home users. I'm > happy that it's going to be done. I expect the first complaints will > come from those people who don't understand it, and their df and du > numbers look different than their zpool status ones. Perhaps df/du > will just have to be faked out for those folks, or we just apply the > same hallucinogens to them instead. > I'm still not convinced that dedup is really worth it for anything but very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are countered. This in many ways reminds me the last year's discussion over file versioning in the filesystem. It sounds like a cool idea, but it's not a generally-good idea. I tend to think that this kind of problem is better served by applications handling it, if they are concerned about it. Pretty much, here's what I've heard: Dedup Advantages: (1) save space relative to the amount of duplication. this is highly dependent on workload, and ranges from 0% to 99%, but the distribution of possibilities isn't a bell curve (i.e. the average space saved isn't 50%). Dedup Disadvantages: (1) increase codebase complexity, in both cases of dedup during write, and ex-post-facto batched dedup (2) noticable write performance penalty (assuming block-level dedup on write), with potential write cache issues. (3) very significant post-write dedup time, at least on the order of 'zfs scrub'. Also, during such a post-write scenario, it more or less takes the zpool out of usage. (4) If dedup is done at block level, not at file level, it kills read performance, effectively turning all dedup'd files from sequential read to a random read. That is, block-level dedup drastically accelerates filesystem fragmentation. (5) Something no one has talked about, but is of concern. By removing duplication, you increase the likelihood that loss of the "master" segment will corrupt many more files. Yes, ZFS has self-healing and such. But, particularly in the case where there is no ZFS pool redundancy (or pool-level redundancy has been compromised), loss of one block can thus be many more times severe. We need to think long and hard about what the real widespread benefits are of dedup before committing to a filesystem-level solution, rather than an application-level one. In particular, we need some real-world data on the actual level of duplication under a wide variety of circumstances. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss