Dennis Clarke wrote:
Does the dedupe functionality happen at the file level or a lower block
level?

block level, but remember that block size may vary from file to file.

I am writing a large number of files that have the fol structure :

------ file begins
1024 lines of random ASCII chars 64 chars long
some tilde chars .. about 1000 of then
some text ( english ) for 2K
more text ( english ) for 700 bytes or so
------------------

Each file has the same tilde chars and then english text at the end of 64K
of random character data.

Before writing the data I see :

# zpool get size,capacity,version,dedupratio,free,allocated zp_dd
NAME   PROPERTY    VALUE    SOURCE
zp_dd  size        67.5G    -
zp_dd  capacity    6%       -
zp_dd  version     21       default
zp_dd  dedupratio  1.16x    -
zp_dd  free        63.3G    -
zp_dd  allocated   4.19G    -

After I see this :

# zpool get size,capacity,version,dedupratio,free,allocated zp_dd
NAME   PROPERTY    VALUE    SOURCE
zp_dd  size        67.5G    -
zp_dd  capacity    6%       -
zp_dd  version     21       default
zp_dd  dedupratio  1.11x    -
zp_dd  free        63.1G    -
zp_dd  allocated   4.36G    -


Note the drop in dedup ratio from 1.16x to 1.11x which seems to indicate
that dedupe does not detect the english text is identical in every file.




Theory: Your files may end up being in one large 128K block or maybe a couple of 64K blocks where there isn't much redundancy to de-dup.

-tim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to