Darren J Moffat <darr...@opensolaris.org> writes:

> Kjetil Torgrim Homme wrote:
>
>> I don't know how tightly interwoven the dedup hash tree and the block
>> pointer hash tree are, or if it is all possible to disentangle them.
>
> At the moment I'd say very interwoven by design.
>
>> conceptually it doesn't seem impossible, but that's easy for me to
>> say, with no knowledge of the zio pipeline...
>
> Correct it isn't impossible but instead there would probably need to
> be two checksums held, one of the untransformed data (ie uncompressed
> and unencrypted) and one of the transformed data (compressed and
> encrypted). That has different tradeoffs and SHA256 can be expensive
> too see:
>
> http://blogs.sun.com/darren/entry/improving_zfs_dedup_performance_via

great work!  SHA256 is more expensive than I thought, even with
misc/sha2 it takes 1 ms per 128 KiB?  that's roughly the same CPU usage
as lzjb!  in that case hashing the (smaller) compressed data is more
efficient than doing an additional hash of the full uncompressed block.

it's interesting to note that 64 KiB looks faster (a bit hard to read
the chart accurately), L1 cache size coming into play, perhaps?

> Note also that the compress/encrypt/checksum and the dedup are
> separate pipeline stages so while dedup is happening for block N block
> N+1 can be getting transformed - so this is designed to take advantage
> of multiple scheduling units (threads,cpus,cores etc).

nice.  are all of them separate stages, or are compress/encrypt/checksum
done as one stage?

>> oh, how does encryption play into this?  just don't?  knowing that
>> someone else has the same block as you is leaking information, but that
>> may be acceptable -- just make different pools for people you don't
>> trust.
>
> compress, encrypt, checksum, dedup.
>
> You are correct that it is an information leak but only within a
> dataset and its clones and only if you can observe the deduplication
> stats (and you need to use zdb to get enough info to see the leak -
> and that means you have access to the raw devices), the deupratio
> isn't really enough unless the pool is really idle or has only one
> user writing at a time.
>
> For the encryption case deduplication of the same plaintext block will
> only work within a dataset or a clone of it - because only in those
> cases do you have the same key (and the way I have implemented the IV
> generation for AES CCM/GCM mode ensures that the same plaintext will
> have the same IV so the ciphertexts will match).

makes sense.

> Also if you place a block in an unencrypted dataset that happens to
> match the ciphertext in an encrypted dataset they won't dedup either
> (you need to understand what I've done with the AES CCM/GCM MAC and
> the zio_chksum_t field in the blkptr_t and how that is used by dedup
> to see why).

wow, I didn't think of that problem.  did you get bitten by wrongful
dedup during testing with image files? :-)

> If that small information leak isn't acceptable even within the
> dataset then don't enable both encryption and deduplication on those
> datasets - and don't delegate that property to your users either.  Or
> you can frequently rekey your per dataset data encryption keys 'zfs
> key -K' but then you might as well turn dedup off - other there are
> some very good usecases in multi level security where doing
> dedup/encryption and rekey provides a nice effect.

indeed.  ZFS is extremely flexible.

thank you for your response, it was very enlightening.
-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to