Enda O'Connor wrote:
it works at a pool wide level with the ability to exclude at a dataset
level, or the converse, if set to off at top level dataset can then set
lower level datasets to on, ie one can include and exclude depending on
the datasets contents.
so largefile will get deduped in the example below.
And you can use 'zdb -S' (which is a lot better now than it used to be
before dedup) to see how much benefit is there (without even turning
dedup on):
bash-3.2# zdb -S rpool
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 625K 9.9G 7.90G 7.90G 625K 9.9G 7.90G 7.90G
2 9.8K 184M 132M 132M 20.7K 386M 277M 277M
4 1.21K 16.6M 10.8M 10.8M 5.71K 76.9M 48.6M 48.6M
8 395 764K 745K 745K 3.75K 6.90M 6.69M 6.69M
16 125 2.71M 888K 888K 2.60K 54.2M 17.9M 17.9M
32 56 2.10M 750K 750K 2.33K 85.6M 29.8M 29.8M
64 9 22.0K 22.0K 22.0K 778 2.04M 2.04M 2.04M
128 4 6.00K 6.00K 6.00K 594 853K 853K 853K
256 2 8K 8K 8K 711 2.78M 2.78M 2.78M
512 2 4.50K 4.50K 4.50K 1.47K 3.52M 3.52M 3.52M
8K 1 128K 128K 128K 15.9K 1.99G 1.99G 1.99G
16K 2 8K 8K 8K 50.7K 203M 203M 203M
Total 637K 10.1G 8.04G 8.04G 730K 12.7G 10.5G 10.5G
dedup = 1.30, compress = 1.22, copies = 1.00, dedup * compress / copies
= 1.58
bash-3.2#
Be careful - can eat lots of RAM!
Many thanks to Jeff and all the team!
Regards,
Victor
Enda
Breandan Dezendorf wrote:
Does dedup work at the pool level or the filesystem/dataset level?
For example, if I were to do this:
bash-3.2$ mkfile 100m /tmp/largefile
bash-3.2$ zfs set dedup=off tank
bash-3.2$ zfs set dedup=on tank/dir1
bash-3.2$ zfs set dedup=on tank/dir2
bash-3.2$ zfs set dedup=on tank/dir3
bash-3.2$ cp /tmp/largefile /tank/dir1/largefile
bash-3.2$ cp /tmp/largefile /tank/dir2/largefile
bash-3.2$ cp /tmp/largefile /tank/dir3/largefile
Would largefile get dedup'ed? Would I need to set dedup on for the
pool, and then disable where it isn't wanted/needed?
Also, will we need to move our data around (send/recv or whatever your
preferred method is) to take advantage of dedup? I was hoping the
blockpointer rewrite code would allow an admin to simply turn on dedup
and let ZFS process the pool, eliminating excess redundancy as it
went.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss