On Fri, Jan 22, 2010 at 08:55:16AM +1100, Daniel Carosone wrote: > For performance (rather than space) issues, I look at dedup as simply > increasing the size of the working set, with a goal of reducing the > amount of IO (avoided duplicate writes) in return.
I should add "and avoided future duplicate reads" in those parentheses as well. A CVS checkout, with identical CVS/Root files in every directory, is a great example. Every one of those files is read on cvs update. Developers often have multiple checkouts (different branches) from the same server. Good performance gains can be had by avoiding potentially many thousands of extra reads and cache entries, whether with dedup or simply by hardlinking them all together. I've hit the 64k limit on hardlinks to the one file more than once with this, on bsd FFS. It's not a great example for my suggestion of a threshold lower blocksize for dedup, however :-/ -- Dan.
pgpleAwmVO8zb.pgp
Description: PGP signature
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss