On Thu, Dec 17, 2009 at 6:14 PM, Kjetil Torgrim Homme <kjeti...@linpro.no> wrote: > Darren J Moffat <darr...@opensolaris.org> writes: >> Kjetil Torgrim Homme wrote: >>> Andrey Kuzmin <andrey.v.kuz...@gmail.com> writes: >>> >>>> Downside you have described happens only when the same checksum is >>>> used for data protection and duplicate detection. This implies sha256, >>>> BTW, since fletcher-based dedupe has been dropped in recent builds. >>> >>> if the hash used for dedup is completely separate from the hash used >>> for data protection, I don't see any downsides to computing the dedup >>> hash from uncompressed data. why isn't it? >> >> It isn't separate because that isn't how Jeff and Bill designed it. > > thanks for confirming that, Darren. > >> I think the design the have is great. > > I don't disagree. > >> Instead of trying to pick holes in the theory can you demonstrate a >> real performance problem with compression=on and dedup=on and show >> that it is because of the compression step ? > > compression requires CPU, actually quite a lot of it. even with the > lean and mean lzjb, you will get not much more than 150 MB/s per core or > something like that. so, if you're copying a 10 GB image file, it will > take a minute or two, just to compress the data so that the hash can be > computed so that the duplicate block can be identified. if the dedup > hash was based on uncompressed data, the copy would be limited by > hashing efficiency (and dedup tree lookup)
This isn't exactly true. If, speculatively, one stores two hashes, one for uncompressed data in ddt and another one, for compressed data, with data block for data healing, one wins compression for duplicates and pays by extra hash computation for singletons. So a more correct question would be if the set of cases where duplicates/singletons and compression/hashing bandwidth ratios are such that one wins is non-empty (or, rather, o practical importance). Regards, Andrey . > > I don't know how tightly interwoven the dedup hash tree and the block > pointer hash tree are, or if it is all possible to disentangle them. > > conceptually it doesn't seem impossible, but that's easy for me to > say, with no knowledge of the zio pipeline... > > oh, how does encryption play into this? just don't? knowing that > someone else has the same block as you is leaking information, but that > may be acceptable -- just make different pools for people you don't > trust. > >> Otherwise if you want it changed code it up and show how what you have >> done is better in all cases. > > I wish I could :-) > > -- > Kjetil T. Homme > Redpill Linpro AS - Changing the game > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss