On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote: > On 5/4/2011 9:57 AM, Ray Van Dolson wrote: > > There are a number of threads (this one[1] for example) that describe > > memory requirements for deduplication. They're pretty high. > > > > I'm trying to get a better understanding... on our NetApps we use 4K > > block sizes with their post-process deduplication and get pretty good > > dedupe ratios for VM content. > > > > Using ZFS we are using 128K record sizes by default, which nets us less > > impressive savings... however, to drop to a 4K record size would > > theoretically require that we have nearly 40GB of memory for only 1TB > > of storage (based on 150 bytes per block for the DDT). > > > > This obviously becomes prohibitively higher for 10+ TB file systems. > > > > I will note that our NetApps are using only 2TB FlexVols, but would > > like to better understand ZFS's (apparently) higher memory > > requirements... or maybe I'm missing something entirely. > > > > Thanks, > > Ray > > I'm not familiar with NetApp's implementation, so I can't speak to > why it might appear to use less resources. > > However, there are a couple of possible issues here: > > (1) Pre-write vs Post-write Deduplication. > ZFS does pre-write dedup, where it looks for duplicates before > it writes anything to disk. In order to do pre-write dedup, you really > have to store the ENTIRE deduplication block lookup table in some sort > of fast (random) access media, realistically Flash or RAM. The win is > that you get significantly lower disk utilization (i.e. better I/O > performance), as (potentially) much less data is actually written to disk. > Post-write Dedup is done via batch processing - that is, such a > design has the system periodically scan the saved data, looking for > duplicates. While this method also greatly benefits from being able to > store the dedup table in fast random storage, it's not anywhere as > critical. The downside here is that you see much higher disk utilization > - the system must first write all new data to disk (without looking for > dedup), and then must also perform significant I/O later on to do the dedup.
Makes sense. > (2) Block size: a 4k block size will yield better dedup than a 128k > block size, presuming reasonable data turnover. This is inherent, as > any single bit change in a block will make it non-duplicated. With 32x > the block size, there is a much greater chance that a small change in > data will require a large loss of dedup ratio. That is, 4k blocks > should almost always yield much better dedup ratios than larger ones. > Also, remember that the ZFS block size is a SUGGESTION for zfs > filesystems (i.e. it will use UP TO that block size, but not always that > size), but is FIXED for zvols. > > (3) Method of storing (and data stored in) the dedup table. > ZFS's current design is (IMHO) rather piggy on DDT and L2ARC > lookup requirements. Right now, ZFS requires a record in the ARC (RAM) > for each L2ARC (cache) entire, PLUS the actual L2ARC entry. So, it > boils down to 500+ bytes of combined L2ARC & RAM usage per block entry > in the DDT. Also, the actual DDT entry itself is perhaps larger than > absolutely necessary. So the addition of L2ARC doesn't necessarily reduce the need for memory (at least not much if you're talking about 500 bytes combined)? I was hoping we could slap in 80GB's of SSD L2ARC and get away with "only" 16GB of RAM for example. > I suspect that NetApp does the following to limit their > resource usage: they presume the presence of some sort of cache that > can be dedicated to the DDT (and, since they also control the hardware, > they can make sure there is always one present). Thus, they can make > their code completely avoid the need for an equivalent to the ARC-based > lookup. In addition, I suspect they have a smaller DDT entry itself. > Which boils down to probably needing 50% of the total resource > consumption of ZFS, and NO (or extremely small, and fixed) RAM requirement. > > Honestly, ZFS's cache (L2ARC) requirements aren't really a problem. The > big issue is the ARC requirements, which, until they can be seriously > reduced (or, best case, simply eliminated), really is a significant > barrier to adoption of ZFS dedup. > > Right now, ZFS treats DDT entries like any other data or metadata in how > it ages from ARC to L2ARC to gone. IMHO, the better way to do this is > simply require the DDT to be entirely stored on the L2ARC (if present), > and not ever keep any DDT info in the ARC at all (that is, the ARC > should contain a pointer to the DDT in the L2ARC, and that's it, > regardless of the amount or frequency of access of the DDT). Frankly, > at this point, I'd almost change the design to REQUIRE a L2ARC device in > order to turn on Dedup. Thanks for you response, Eric. Very helpful. Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss