On Wed, May 04, 2011 at 03:49:12PM -0700, Erik Trimble wrote:
> On 5/4/2011 2:54 PM, Ray Van Dolson wrote:
> > On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
> >> (2) Block size:  a 4k block size will yield better dedup than a 128k
> >> block size, presuming reasonable data turnover.  This is inherent, as
> >> any single bit change in a block will make it non-duplicated.  With 32x
> >> the block size, there is a much greater chance that a small change in
> >> data will require a large loss of dedup ratio.  That is, 4k blocks
> >> should almost always yield much better dedup ratios than larger ones.
> >> Also, remember that the ZFS block size is a SUGGESTION for zfs
> >> filesystems (i.e. it will use UP TO that block size, but not always that
> >> size), but is FIXED for zvols.
> >>
> >> (3) Method of storing (and data stored in) the dedup table.
> >>           ZFS's current design is (IMHO) rather piggy on DDT and L2ARC
> >> lookup requirements. Right now, ZFS requires a record in the ARC (RAM)
> >> for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it
> >> boils down to 500+ bytes of combined L2ARC&  RAM usage per block entry
> >> in the DDT.  Also, the actual DDT entry itself is perhaps larger than
> >> absolutely necessary.
> > So the addition of L2ARC doesn't necessarily reduce the need for
> > memory (at least not much if you're talking about 500 bytes combined)?
> > I was hoping we could slap in 80GB's of SSD L2ARC and get away with
> > "only" 16GB of RAM for example.
> 
> It reduces *somewhat* the need for RAM.  Basically, if you have no L2ARC 
> cache device, the DDT must be stored in RAM.  That's about 376 bytes per 
> dedup block.
> 
> If you have an L2ARC cache device, then the ARC must contain a reference 
> to every DDT entry stored in the L2ARC, which consumes 176 bytes per DDT 
> entry reference.
> 
> So, adding a L2ARC reduces the ARC consumption by about 55%.
> 
> Of course, the other benefit from a L2ARC is the data/metadata caching, 
> which is likely worth it just by itself.

Great info.  Thanks Erik.

For dedupe workloads on larger file systems (8TB+), I wonder if makes
sense to use SLC / enterprise class SSD (or better) devices for L2ARC
instead of lower-end MLC stuff?  Seems like we'd be seeing more writes
to the device than in a non-dedupe scenario.

Thanks,
Ray
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to