On Fri, Jan 22, 2010 at 08:55:16AM +1100, Daniel Carosone wrote:
> For performance (rather than space) issues, I look at dedup as simply
> increasing the size of the working set, with a goal of reducing the
> amount of IO (avoided duplicate writes) in return.

I should add "and avoided future duplicate reads" in those parentheses
as well. 

A CVS checkout, with identical CVS/Root files in every directory, is a
great example. Every one of those files is read on cvs update.
Developers often have multiple checkouts (different branches) from the
same server. Good performance gains can be had by avoiding potentially
many thousands of extra reads and cache entries, whether with dedup or
simply by hardlinking them all together.   I've hit the 64k limit on
hardlinks to the one file more than once with this, on bsd FFS.

It's not a great example for my suggestion of a threshold lower
blocksize for dedup, however :-/

--
Dan.

Attachment: pgpleAwmVO8zb.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to