On 7/10/2010 5:24 AM, Richard Elling wrote:
On Jul 9, 2010, at 11:10 PM, Brandon High wrote:

On Fri, Jul 9, 2010 at 5:18 PM, Brandon High<bh...@freaks.com>  wrote:
I think that DDT entries are a little bigger than what you're using. The size 
seems to range between 150 and 250 bytes depending on how it's calculated, call 
it 200b each. Your 128G dataset would require closer to 200M (+/- 25%) for the 
DDT if your data was completely unique. 1TB of unique data would require 600M - 
1000M for the DDT.

Using 376b per entry, it's 376M for 128G of unique data, or just under 3GB for 
1TB of unique data.
4% seems to be a pretty good SWAG.

A 1TB zvol with 8k blocks would require almost 24GB of memory to hold the DDT. 
Ouch.
... or more than 300GB for 512-byte records.

The performance issue is that DDT access tends to be random. This implies that
if you don't have a lot of RAM and your pool has poor random read I/O 
performance,
then you will not be impressed with dedup performance. In other words, trying to
dedup lots of data on a small DRAM machine using big, slow pool HDDs will not 
set
any benchmark records. By contrast, using SSDs for the pool can demonstrate good
random read performance. As the price per bit of HDDs continues to drop, the 
value
of deduping pools using HDDs also drops.
  -- richard


Which brings up an interesting idea: if I have a pool with good random I/O (perhaps made from SSDs, or even one of those nifty Oracle F5100 things), I would probably not want to have a DDT created, or at least have one that was very significantly abbreviated. What capability does ZFS have for recognizing that we won't need a full DDT created for high-I/O-speed pools? Particularly with the fact that such pools would almost certainly be heavy candidates for dedup (the $/GB being significantly higher than other mediums, and thus space being at a premium) ?

I'm not up on exactly how the DDT gets built and referenced to understand how this might happen. But, I can certainly see it as being useful to tell ZFS (perhaps through a pool property?) that building an in-ARC DDT isn't really needed.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to