> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Richard L. Hamilton > > I would imagine that if it's read-mostly, it's a win, but > otherwise it costs more than it saves. Even more conventional > compression tends to be more resource intensive than decompression...
I would imagine it's *easier* to have a win when it's read-mostly, but the expense of computing checksums is going to be done either way, with or without dedup. The only extra cost dedup adds is to maintain a hash tree of some kind, to see if some block has already been stored on disk. So ... of course I'm speaking hypothetically and haven't been proven ... I think dedup will accelerate the system in nearly all use cases. The main exception is whenever you have highly non-duplicated data. I think the cost of dedup CPU power is tiny little small, but in the case of highly non-duplicated data, even that little expense is a waste. > What I'm wondering is when dedup is a better value than compression. Whenever files have internal repetition, compression will be better. Whenever the repetition crosses file barriers, dedup will be better. > Most obviously, when there are a lot of identical blocks across > different > files; but I'm not sure how often that happens, aside from maybe > blocks of zeros (which may well be sparse anyway). I think the main value here is when there are more than one copy of some files in the filesystem. For example: In subversion, there are two copies of every file in your working directory. Every file has a corresponding "base" copy located in the .svn directory. If you have a lot of developers ... software or whatever ... who have all checked out the same project, and they're all working on it in their home directories ... All of those copies get essentially cut down to 1. Combine the developers with subversion ... You would have 2x copies of every file, in every person's home dir = ... a lot of copies of the same files ... All cut down to 1. You build some package from source code. somefile.c becomes somefile.o, and then the linker takes somefile.o and a bunch of other .o files and mashes them all together to make "finalproduct" executable file. Well, that executable is just copies of all these .o files mashed together. So again ... cut it all down to 1. And multiply by the number of developers who are all doing the same thing in their home dirs. Others have mentioned VM's, when VM's are duplicated ... I don't personally duplicate many VM's, so it doesn't matter to me ... but I see this value for others ... _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss