On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
> On 5/4/2011 9:57 AM, Ray Van Dolson wrote:
> > There are a number of threads (this one[1] for example) that describe
> > memory requirements for deduplication.  They're pretty high.
> >
> > I'm trying to get a better understanding... on our NetApps we use 4K
> > block sizes with their post-process deduplication and get pretty good
> > dedupe ratios for VM content.
> >
> > Using ZFS we are using 128K record sizes by default, which nets us less
> > impressive savings... however, to drop to a 4K record size would
> > theoretically require that we have nearly 40GB of memory for only 1TB
> > of storage (based on 150 bytes per block for the DDT).
> >
> > This obviously becomes prohibitively higher for 10+ TB file systems.
> >
> > I will note that our NetApps are using only 2TB FlexVols, but would
> > like to better understand ZFS's (apparently) higher memory
> > requirements... or maybe I'm missing something entirely.
> >
> > Thanks,
> > Ray
> 
> I'm not familiar with NetApp's implementation, so I can't speak to
> why it might appear to use less resources.
> 
> However, there are a couple of possible issues here:
> 
> (1)  Pre-write vs Post-write Deduplication.
>          ZFS does pre-write dedup, where it looks for duplicates before 
> it writes anything to disk.  In order to do pre-write dedup, you really 
> have to store the ENTIRE deduplication block lookup table in some sort 
> of fast (random) access media, realistically Flash or RAM.  The win is 
> that you get significantly lower disk utilization (i.e. better I/O 
> performance), as (potentially) much less data is actually written to disk.
>          Post-write Dedup is done via batch processing - that is, such a 
> design has the system periodically scan the saved data, looking for 
> duplicates. While this method also greatly benefits from being able to 
> store the dedup table in fast random storage, it's not anywhere as 
> critical. The downside here is that you see much higher disk utilization 
> - the system must first write all new data to disk (without looking for 
> dedup), and then must also perform significant I/O later on to do the dedup.

Makes sense.

> (2) Block size:  a 4k block size will yield better dedup than a 128k 
> block size, presuming reasonable data turnover.  This is inherent, as 
> any single bit change in a block will make it non-duplicated.  With 32x 
> the block size, there is a much greater chance that a small change in 
> data will require a large loss of dedup ratio.  That is, 4k blocks 
> should almost always yield much better dedup ratios than larger ones. 
> Also, remember that the ZFS block size is a SUGGESTION for zfs 
> filesystems (i.e. it will use UP TO that block size, but not always that 
> size), but is FIXED for zvols.
> 
> (3) Method of storing (and data stored in) the dedup table.
>          ZFS's current design is (IMHO) rather piggy on DDT and L2ARC 
> lookup requirements. Right now, ZFS requires a record in the ARC (RAM) 
> for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it 
> boils down to 500+ bytes of combined L2ARC & RAM usage per block entry 
> in the DDT.  Also, the actual DDT entry itself is perhaps larger than 
> absolutely necessary.

So the addition of L2ARC doesn't necessarily reduce the need for
memory (at least not much if you're talking about 500 bytes combined)?
I was hoping we could slap in 80GB's of SSD L2ARC and get away with
"only" 16GB of RAM for example.

>          I suspect that NetApp does the following to limit their 
> resource usage:   they presume the presence of some sort of cache that 
> can be dedicated to the DDT (and, since they also control the hardware, 
> they can make sure there is always one present).  Thus, they can make 
> their code completely avoid the need for an equivalent to the ARC-based 
> lookup.  In addition, I suspect they have a smaller DDT entry itself.  
> Which boils down to probably needing 50% of the total resource 
> consumption of ZFS, and NO (or extremely small, and fixed) RAM requirement.
> 
> Honestly, ZFS's cache (L2ARC) requirements aren't really a problem. The 
> big issue is the ARC requirements, which, until they can be seriously 
> reduced (or, best case, simply eliminated), really is a significant 
> barrier to adoption of ZFS dedup.
> 
> Right now, ZFS treats DDT entries like any other data or metadata in how 
> it ages from ARC to L2ARC to gone.  IMHO, the better way to do this is 
> simply require the DDT to be entirely stored on the L2ARC (if present), 
> and not ever keep any DDT info in the ARC at all (that is, the ARC 
> should contain a pointer to the DDT in the L2ARC, and that's it, 
> regardless of the amount or frequency of access of the DDT).  Frankly, 
> at this point, I'd almost change the design to REQUIRE a L2ARC device in 
> order to turn on Dedup.

Thanks for you response, Eric.  Very helpful.

Ray
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to