Erik Trimble wrote:
Roy Sigurd Karlsbakk wrote:
Hi all

I've been doing a lot of testing with dedup and concluded it's not really ready for production. If something fails, it can render the pool unuseless for hours or maybe days, perhaps due to single-threded stuff in zfs. There is also very little data available in the docs (though I've from what I've got on this list) on how much memory one should have for deduping an xTiB dataset.
I think it was Richard a month or so ago that had a good post about about how much space the Dedup Table entry would be (it was in some discussion where I ask about it). I can't remember what it was (a hundred bytes?) per DDT entry, but one had to remember that each entry was for a slab, which can vary in size (512 bytes to 128k). So, there's no good generic formula for X bytes in RAM per Y TB space. You can compute a rough guess if you know what kind of data and the general usage pattern is for the pool (basically, you need to take a stab at how big you think the average slab size is). Also, remember that if you have a /very/ good dedup ratio, then you will have a smaller DDT for a given X size pool, vs a pool with poor dedup ratios. Unfortunately, there's no magic bullet, though if you can dig up Richard's post, you should be able to take a guess, and not be off more than x2 or so. Also, remember you only need to hold the DDT in L2ARC, not in actual RAM, so buy that SSD, young man!

As far as failures, well, I can't speak to that specifically. Though, do realize that not having sufficient L2ARC/RAM to hold the DDT does mean that you spend an awful amount of time reading pool metadata, which really hurts performance (not to mention can cripple deleting of any sort...)

Here's Richard Elling's post in the "dedup and memory/l2arc requirements" thread where he presents a worst case DDT size upper bound:
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-April/039516.html

------start of copy------

You can estimate the amount of disk space needed for the deduplication table

and the expected deduplication ratio by using "zdb -S poolname" on your existing
pool.  Be patient, for an existing pool with lots of objects, this can take 
some time to run.

# ptime zdb -S zwimming
Simulated DDT histogram:

bucket allocated referenced ______ ______________________________ ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
    1    2.27M    239G    188G    194G    2.27M    239G    188G    194G
    2     327K   34.3G   27.8G   28.1G     698K   73.3G   59.2G   59.9G
    4    30.1K   2.91G   2.10G   2.11G     152K   14.9G   10.6G   10.6G
    8    7.73K    691M    529M    529M    74.5K   6.25G   4.79G   4.80G
   16      673   43.7M   25.8M   25.9M    13.1K    822M    492M    494M
   32      197   12.3M   7.02M   7.03M    7.66K    480M    269M    270M
   64       47   1.27M    626K    626K    3.86K    103M   51.2M   51.2M
  128       22    908K    250K    251K    3.71K    150M   40.3M   40.3M
  256        7    302K     48K   53.7K    2.27K   88.6M   17.3M   19.5M
  512        4    131K   7.50K   7.75K    2.74K    102M   5.62M   5.79M
   2K        1      2K      2K      2K    3.23K   6.47M   6.47M   6.47M
   8K        1    128K      5K      5K    13.9K   1.74G   69.5M   69.5M
Total    2.63M    277G    218G    225G    3.22M    337G    263G    270G

dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50


real     8:02.391932786
user     1:24.231855093
sys        15.193256108

In this file system, 2.75 million blocks are allocated. The in-core size
of a DDT entry is approximately 250 bytes.  So the math is pretty simple:
        in-core size = 2.63M * 250 = 657.5 MB

If your dedup ratio is 1.0, then this number will scale linearly with size.
If the dedup rate > 1.0, then this number will not scale linearly, it will be
less. So you can use the linear scale as a worst-case approximation.
-- richard

------end of copy------


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to