On Wed, May 04, 2011 at 03:49:12PM -0700, Erik Trimble wrote: > On 5/4/2011 2:54 PM, Ray Van Dolson wrote: > > On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote: > >> (2) Block size: a 4k block size will yield better dedup than a 128k > >> block size, presuming reasonable data turnover. This is inherent, as > >> any single bit change in a block will make it non-duplicated. With 32x > >> the block size, there is a much greater chance that a small change in > >> data will require a large loss of dedup ratio. That is, 4k blocks > >> should almost always yield much better dedup ratios than larger ones. > >> Also, remember that the ZFS block size is a SUGGESTION for zfs > >> filesystems (i.e. it will use UP TO that block size, but not always that > >> size), but is FIXED for zvols. > >> > >> (3) Method of storing (and data stored in) the dedup table. > >> ZFS's current design is (IMHO) rather piggy on DDT and L2ARC > >> lookup requirements. Right now, ZFS requires a record in the ARC (RAM) > >> for each L2ARC (cache) entire, PLUS the actual L2ARC entry. So, it > >> boils down to 500+ bytes of combined L2ARC& RAM usage per block entry > >> in the DDT. Also, the actual DDT entry itself is perhaps larger than > >> absolutely necessary. > > So the addition of L2ARC doesn't necessarily reduce the need for > > memory (at least not much if you're talking about 500 bytes combined)? > > I was hoping we could slap in 80GB's of SSD L2ARC and get away with > > "only" 16GB of RAM for example. > > It reduces *somewhat* the need for RAM. Basically, if you have no L2ARC > cache device, the DDT must be stored in RAM. That's about 376 bytes per > dedup block. > > If you have an L2ARC cache device, then the ARC must contain a reference > to every DDT entry stored in the L2ARC, which consumes 176 bytes per DDT > entry reference. > > So, adding a L2ARC reduces the ARC consumption by about 55%. > > Of course, the other benefit from a L2ARC is the data/metadata caching, > which is likely worth it just by itself.
Great info. Thanks Erik. For dedupe workloads on larger file systems (8TB+), I wonder if makes sense to use SLC / enterprise class SSD (or better) devices for L2ARC instead of lower-end MLC stuff? Seems like we'd be seeing more writes to the device than in a non-dedupe scenario. Thanks, Ray _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss