Re: [zfs-discuss] Deduplication Memory Requirements

Erik Trimble Wed, 04 May 2011 12:31:02 -0700

On 5/4/2011 9:57 AM, Ray Van Dolson wrote:

There are a number of threads (this one[1] for example) that describe
memory requirements for deduplication.  They're pretty high.


I'm trying to get a better understanding... on our NetApps we use 4K
block sizes with their post-process deduplication and get pretty good
dedupe ratios for VM content.

Using ZFS we are using 128K record sizes by default, which nets us less
impressive savings... however, to drop to a 4K record size would
theoretically require that we have nearly 40GB of memory for only 1TB
of storage (based on 150 bytes per block for the DDT).

This obviously becomes prohibitively higher for 10+ TB file systems.

I will note that our NetApps are using only 2TB FlexVols, but would
like to better understand ZFS's (apparently) higher memory
requirements... or maybe I'm missing something entirely.

Thanks,
Ray

[1] http://markmail.org/message/wile6kawka6qnjdw
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

I'm not familiar with NetApp's implementation, so I can't speak to whyit might appear to use less resources.


However, there are a couple of possible issues here:

(1)  Pre-write vs Post-write Deduplication.

ZFS does pre-write dedup, where it looks for duplicates beforeit writes anything to disk. In order to do pre-write dedup, you reallyhave to store the ENTIRE deduplication block lookup table in some sortof fast (random) access media, realistically Flash or RAM. The win isthat you get significantly lower disk utilization (i.e. better I/Operformance), as (potentially) much less data is actually written to disk.Post-write Dedup is done via batch processing - that is, such adesign has the system periodically scan the saved data, looking forduplicates. While this method also greatly benefits from being able tostore the dedup table in fast random storage, it's not anywhere ascritical. The downside here is that you see much higher disk utilization- the system must first write all new data to disk (without looking fordedup), and then must also perform significant I/O later on to do the dedup.

(2) Block size: a 4k block size will yield better dedup than a 128kblock size, presuming reasonable data turnover. This is inherent, asany single bit change in a block will make it non-duplicated. With 32xthe block size, there is a much greater chance that a small change indata will require a large loss of dedup ratio. That is, 4k blocksshould almost always yield much better dedup ratios than larger ones.Also, remember that the ZFS block size is a SUGGESTION for zfsfilesystems (i.e. it will use UP TO that block size, but not always thatsize), but is FIXED for zvols.


(3) Method of storing (and data stored in) the dedup table.

ZFS's current design is (IMHO) rather piggy on DDT and L2ARClookup requirements. Right now, ZFS requires a record in the ARC (RAM)for each L2ARC (cache) entire, PLUS the actual L2ARC entry. So, itboils down to 500+ bytes of combined L2ARC & RAM usage per block entryin the DDT. Also, the actual DDT entry itself is perhaps larger thanabsolutely necessary.I suspect that NetApp does the following to limit theirresource usage: they presume the presence of some sort of cache thatcan be dedicated to the DDT (and, since they also control the hardware,they can make sure there is always one present). Thus, they can maketheir code completely avoid the need for an equivalent to the ARC-basedlookup. In addition, I suspect they have a smaller DDT entry itself.Which boils down to probably needing 50% of the total resourceconsumption of ZFS, and NO (or extremely small, and fixed) RAM requirement.

Honestly, ZFS's cache (L2ARC) requirements aren't really a problem. Thebig issue is the ARC requirements, which, until they can be seriouslyreduced (or, best case, simply eliminated), really is a significantbarrier to adoption of ZFS dedup.

Right now, ZFS treats DDT entries like any other data or metadata in howit ages from ARC to L2ARC to gone. IMHO, the better way to do this issimply require the DDT to be entirely stored on the L2ARC (if present),and not ever keep any DDT info in the ARC at all (that is, the ARCshould contain a pointer to the DDT in the L2ARC, and that's it,regardless of the amount or frequency of access of the DDT). Frankly,at this point, I'd almost change the design to REQUIRE a L2ARC device inorder to turn on Dedup.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Deduplication Memory Requirements

Reply via email to