Re: [zfs-discuss] dedup status

Haudy Kazemi Sun, 16 May 2010 08:22:52 -0700

Erik Trimble wrote:

Roy Sigurd Karlsbakk wrote:
Hi all
I've been doing a lot of testing with dedup and concluded it's notreally ready for production. If something fails, it can render thepool unuseless for hours or maybe days, perhaps due to single-thrededstuff in zfs. There is also very little data available in the docs(though I've from what I've got on this list) on how much memory oneshould have for deduping an xTiB dataset.
I think it was Richard a month or so ago that had a good post aboutabout how much space the Dedup Table entry would be (it was in somediscussion where I ask about it). I can't remember what it was (ahundred bytes?) per DDT entry, but one had to remember that each entrywas for a slab, which can vary in size (512 bytes to 128k). So,there's no good generic formula for X bytes in RAM per Y TB space.You can compute a rough guess if you know what kind of data and thegeneral usage pattern is for the pool (basically, you need to take astab at how big you think the average slab size is). Also, rememberthat if you have a /very/ good dedup ratio, then you will have asmaller DDT for a given X size pool, vs a pool with poor dedup ratios.Unfortunately, there's no magic bullet, though if you can dig upRichard's post, you should be able to take a guess, and not be offmore than x2 or so.Also, remember you only need to hold the DDT in L2ARC, not in actualRAM, so buy that SSD, young man!
As far as failures, well, I can't speak to that specifically. Though,do realize that not having sufficient L2ARC/RAM to hold the DDT doesmean that you spend an awful amount of time reading pool metadata,which really hurts performance (not to mention can cripple deleting ofany sort...)

Here's Richard Elling's post in the "dedup and memory/l2arcrequirements" thread where he presents a worst case DDT size upper bound:

http://mail.opensolaris.org/pipermail/zfs-discuss/2010-April/039516.html

------start of copy------

You can estimate the amount of disk space needed for the deduplication table

and the expected deduplication ratio by using "zdb -S poolname" on your existing
pool.  Be patient, for an existing pool with lots of objects, this can take 
some time to run.

# ptime zdb -S zwimming
Simulated DDT histogram:

bucket allocated referenced______ ______________________________ ______________________________

refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
    1    2.27M    239G    188G    194G    2.27M    239G    188G    194G
    2     327K   34.3G   27.8G   28.1G     698K   73.3G   59.2G   59.9G
    4    30.1K   2.91G   2.10G   2.11G     152K   14.9G   10.6G   10.6G
    8    7.73K    691M    529M    529M    74.5K   6.25G   4.79G   4.80G
   16      673   43.7M   25.8M   25.9M    13.1K    822M    492M    494M
   32      197   12.3M   7.02M   7.03M    7.66K    480M    269M    270M
   64       47   1.27M    626K    626K    3.86K    103M   51.2M   51.2M
  128       22    908K    250K    251K    3.71K    150M   40.3M   40.3M
  256        7    302K     48K   53.7K    2.27K   88.6M   17.3M   19.5M
  512        4    131K   7.50K   7.75K    2.74K    102M   5.62M   5.79M
   2K        1      2K      2K      2K    3.23K   6.47M   6.47M   6.47M
   8K        1    128K      5K      5K    13.9K   1.74G   69.5M   69.5M
Total    2.63M    277G    218G    225G    3.22M    337G    263G    270G

dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50


real     8:02.391932786
user     1:24.231855093
sys        15.193256108

In this file system, 2.75 million blocks are allocated. The in-core size
of a DDT entry is approximately 250 bytes.  So the math is pretty simple:
        in-core size = 2.63M * 250 = 657.5 MB

If your dedup ratio is 1.0, then this number will scale linearly with size.
If the dedup rate > 1.0, then this number will not scale linearly, it will be
less. So you can use the linear scale as a worst-case approximation.
-- richard

------end of copy------


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] dedup status

Reply via email to