Dennis Clarke wrote:
Does the dedupe functionality happen at the file level or a lower block level?
block level, but remember that block size may vary from file to file.
I am writing a large number of files that have the fol structure : ------ file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so ------------------ Each file has the same tilde chars and then english text at the end of 64K of random character data. Before writing the data I see : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTY VALUE SOURCE zp_dd size 67.5G - zp_dd capacity 6% - zp_dd version 21 default zp_dd dedupratio 1.16x - zp_dd free 63.3G - zp_dd allocated 4.19G - After I see this : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTY VALUE SOURCE zp_dd size 67.5G - zp_dd capacity 6% - zp_dd version 21 default zp_dd dedupratio 1.11x - zp_dd free 63.1G - zp_dd allocated 4.36G - Note the drop in dedup ratio from 1.16x to 1.11x which seems to indicate that dedupe does not detect the english text is identical in every file.
Theory: Your files may end up being in one large 128K block or maybe a couple of 64K blocks where there isn't much redundancy to de-dup.
-tim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss