On 7/10/2010 5:24 AM, Richard Elling wrote:
On Jul 9, 2010, at 11:10 PM, Brandon High wrote:
On Fri, Jul 9, 2010 at 5:18 PM, Brandon High<bh...@freaks.com> wrote:
I think that DDT entries are a little bigger than what you're using. The size
seems to range between 150 and 250 bytes depending on how it's calculated, call
it 200b each. Your 128G dataset would require closer to 200M (+/- 25%) for the
DDT if your data was completely unique. 1TB of unique data would require 600M -
1000M for the DDT.
Using 376b per entry, it's 376M for 128G of unique data, or just under 3GB for
1TB of unique data.
4% seems to be a pretty good SWAG.
A 1TB zvol with 8k blocks would require almost 24GB of memory to hold the DDT.
Ouch.
... or more than 300GB for 512-byte records.
The performance issue is that DDT access tends to be random. This implies that
if you don't have a lot of RAM and your pool has poor random read I/O
performance,
then you will not be impressed with dedup performance. In other words, trying to
dedup lots of data on a small DRAM machine using big, slow pool HDDs will not
set
any benchmark records. By contrast, using SSDs for the pool can demonstrate good
random read performance. As the price per bit of HDDs continues to drop, the
value
of deduping pools using HDDs also drops.
-- richard
Which brings up an interesting idea: if I have a pool with good random
I/O (perhaps made from SSDs, or even one of those nifty Oracle F5100
things), I would probably not want to have a DDT created, or at least
have one that was very significantly abbreviated. What capability does
ZFS have for recognizing that we won't need a full DDT created for
high-I/O-speed pools? Particularly with the fact that such pools would
almost certainly be heavy candidates for dedup (the $/GB being
significantly higher than other mediums, and thus space being at a
premium) ?
I'm not up on exactly how the DDT gets built and referenced to
understand how this might happen. But, I can certainly see it as being
useful to tell ZFS (perhaps through a pool property?) that building an
in-ARC DDT isn't really needed.
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss