Erik Trimble wrote:
Roy Sigurd Karlsbakk wrote:
Hi all
I've been doing a lot of testing with dedup and concluded it's not
really ready for production. If something fails, it can render the
pool unuseless for hours or maybe days, perhaps due to single-threded
stuff in zfs. There is also very little data available in the docs
(though I've from what I've got on this list) on how much memory one
should have for deduping an xTiB dataset.
I think it was Richard a month or so ago that had a good post about
about how much space the Dedup Table entry would be (it was in some
discussion where I ask about it). I can't remember what it was (a
hundred bytes?) per DDT entry, but one had to remember that each entry
was for a slab, which can vary in size (512 bytes to 128k). So,
there's no good generic formula for X bytes in RAM per Y TB space.
You can compute a rough guess if you know what kind of data and the
general usage pattern is for the pool (basically, you need to take a
stab at how big you think the average slab size is). Also, remember
that if you have a /very/ good dedup ratio, then you will have a
smaller DDT for a given X size pool, vs a pool with poor dedup ratios.
Unfortunately, there's no magic bullet, though if you can dig up
Richard's post, you should be able to take a guess, and not be off
more than x2 or so.
Also, remember you only need to hold the DDT in L2ARC, not in actual
RAM, so buy that SSD, young man!
As far as failures, well, I can't speak to that specifically. Though,
do realize that not having sufficient L2ARC/RAM to hold the DDT does
mean that you spend an awful amount of time reading pool metadata,
which really hurts performance (not to mention can cripple deleting of
any sort...)
Here's Richard Elling's post in the "dedup and memory/l2arc
requirements" thread where he presents a worst case DDT size upper bound:
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-April/039516.html
------start of copy------
You can estimate the amount of disk space needed for the deduplication table
and the expected deduplication ratio by using "zdb -S poolname" on your existing
pool. Be patient, for an existing pool with lots of objects, this can take
some time to run.
# ptime zdb -S zwimming
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 2.27M 239G 188G 194G 2.27M 239G 188G 194G
2 327K 34.3G 27.8G 28.1G 698K 73.3G 59.2G 59.9G
4 30.1K 2.91G 2.10G 2.11G 152K 14.9G 10.6G 10.6G
8 7.73K 691M 529M 529M 74.5K 6.25G 4.79G 4.80G
16 673 43.7M 25.8M 25.9M 13.1K 822M 492M 494M
32 197 12.3M 7.02M 7.03M 7.66K 480M 269M 270M
64 47 1.27M 626K 626K 3.86K 103M 51.2M 51.2M
128 22 908K 250K 251K 3.71K 150M 40.3M 40.3M
256 7 302K 48K 53.7K 2.27K 88.6M 17.3M 19.5M
512 4 131K 7.50K 7.75K 2.74K 102M 5.62M 5.79M
2K 1 2K 2K 2K 3.23K 6.47M 6.47M 6.47M
8K 1 128K 5K 5K 13.9K 1.74G 69.5M 69.5M
Total 2.63M 277G 218G 225G 3.22M 337G 263G 270G
dedup = 1.20, compress = 1.28, copies = 1.03, dedup * compress / copies = 1.50
real 8:02.391932786
user 1:24.231855093
sys 15.193256108
In this file system, 2.75 million blocks are allocated. The in-core size
of a DDT entry is approximately 250 bytes. So the math is pretty simple:
in-core size = 2.63M * 250 = 657.5 MB
If your dedup ratio is 1.0, then this number will scale linearly with size.
If the dedup rate > 1.0, then this number will not scale linearly, it will be
less. So you can use the linear scale as a worst-case approximation.
-- richard
------end of copy------
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss