Hello all, While revising my home NAS which had dedup enabled before I gathered that its RAM capacity was too puny for the task, I found that there is some deduplication among the data bits I uploaded there (makes sense, since it holds backups of many of the computers I've worked on - some of my homedirs' contents were bound to intersect). However, a lot of the blocks are in fact "unique" - have entries in the DDT with count=1 and the blkptr_t bit set. In fact they are not deduped, and with my pouring of backups complete - they are unlikely to ever become deduped.
Thus these many unique "deduped" blocks are just a burden when my system writes into the datasets with dedup enabled, when it walks the superfluously large DDT, when it has to store this DDT on disk and in ARC, maybe during the scrubbing... These entries bring lots of headache (or performance degradation) for zero gain. So I thought it would be a nice feature to let ZFS go over the DDT (I won't care if it requires to offline/export the pool) and evict the entries with count==1 as well as locate the block-pointer tree entries on disk and clear the dedup bits, making such blocks into regular unique ones. This would require rewriting metadata (less DDT, new blockpointer) but should not touch or reallocate the already-saved userdata (blocks' contents) on the disk. The new BP without the dedup bit set would have the same contents of other fields (though its parents would of course have to be changed more - new DVAs, new checksums...) In the end my pool would only track as deduped those blocks which do already have two or more references - which, given the "static" nature of such backup box, should be enough (i.e. new full backups of the same source data would remain deduped and use no extra space, while unique data won't waste the resources being accounted as deduped). What do you think? //Jim _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss