Jeremy Kitchen wrote:
On Nov 2, 2009, at 9:07 AM, Victor Latushkin wrote:
Enda O'Connor wrote:
it works at a pool wide level with the ability to exclude at a
dataset level, or the converse, if set to off at top level dataset
can then set lower level datasets to on, ie one can include and
exclude depending on the datasets contents.
so largefile will get deduped in the example below.
And you can use 'zdb -S' (which is a lot better now than it used to be
before dedup) to see how much benefit is there (without even turning
dedup on):
forgive my ignorance, but what's the advantage of this new dedup over
the existing compression option? Wouldn't full-filesystem compression
naturally de-dupe?
See this for example:
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 625K 9.9G 7.90G 7.90G 625K 9.9G 7.90G 7.90G
2 9.8K 184M 132M 132M 20.7K 386M 277M 277M
Allocated means what is actually allocated on disk, referenced - what
would be allocated on disk without deduplication; then LSIZE denotes
logical size, PSIZE denotes physical size after compression.
Row with reference count of 1 shows the same figures both in "allocated"
and "referenced" and this is expected - there only one reference to a block.
But row with reference count of 2 shows good difference - without
deduplication it is 20.7 thousands blocks on disk with logical size
totalling to 386M and physical size after compression 277M. With
deduplication there would be only 9.8 thousands blocks on disk (dedup
factor of over 2x!), with logical size totalling to 184M and physical
size of 132M.
So with compression without deduplication it is 277M on disk, with
deduplication it would be only 132M - good savings!
Hope this helps,
victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss