On 5/4/2011 4:17 PM, Ray Van Dolson wrote:
On Wed, May 04, 2011 at 03:49:12PM -0700, Erik Trimble wrote:
On 5/4/2011 2:54 PM, Ray Van Dolson wrote:
On Wed, May 04, 2011 at 12:29:06PM -0700, Erik Trimble wrote:
(2) Block size:  a 4k block size will yield better dedup than a 128k
block size, presuming reasonable data turnover.  This is inherent, as
any single bit change in a block will make it non-duplicated.  With 32x
the block size, there is a much greater chance that a small change in
data will require a large loss of dedup ratio.  That is, 4k blocks
should almost always yield much better dedup ratios than larger ones.
Also, remember that the ZFS block size is a SUGGESTION for zfs
filesystems (i.e. it will use UP TO that block size, but not always that
size), but is FIXED for zvols.

(3) Method of storing (and data stored in) the dedup table.
           ZFS's current design is (IMHO) rather piggy on DDT and L2ARC
lookup requirements. Right now, ZFS requires a record in the ARC (RAM)
for each L2ARC (cache) entire, PLUS the actual L2ARC entry.  So, it
boils down to 500+ bytes of combined L2ARC&   RAM usage per block entry
in the DDT.  Also, the actual DDT entry itself is perhaps larger than
absolutely necessary.
So the addition of L2ARC doesn't necessarily reduce the need for
memory (at least not much if you're talking about 500 bytes combined)?
I was hoping we could slap in 80GB's of SSD L2ARC and get away with
"only" 16GB of RAM for example.
It reduces *somewhat* the need for RAM.  Basically, if you have no L2ARC
cache device, the DDT must be stored in RAM.  That's about 376 bytes per
dedup block.

If you have an L2ARC cache device, then the ARC must contain a reference
to every DDT entry stored in the L2ARC, which consumes 176 bytes per DDT
entry reference.

So, adding a L2ARC reduces the ARC consumption by about 55%.

Of course, the other benefit from a L2ARC is the data/metadata caching,
which is likely worth it just by itself.
Great info.  Thanks Erik.

For dedupe workloads on larger file systems (8TB+), I wonder if makes
sense to use SLC / enterprise class SSD (or better) devices for L2ARC
instead of lower-end MLC stuff?  Seems like we'd be seeing more writes
to the device than in a non-dedupe scenario.

Thanks,
Ray
I'm using Enterprise-class MLC drives (without a supercap), and they work fine with dedup. I'd have to test, but I don't think that the increase in write is that much, so I don't expect a SLC to really make much of a difference. (fill rate of the L2ARC is limited, so I can't imaging we'd bump up against the MLC's limits)

--

Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to