[ceph-users] SSE-KMS vs SSE-S3 with per-object-data-keys

Stefan Schueffler Thu, 24 Nov 2022 00:57:55 -0800

Hi,

i appreciate a lot the recently added SSE-S3 encryption in radosgw. As far as i 
know, this encryption works very similar to the „original“ design in Amazon S3:


- it uses a per-bucket master key (used solely to encrypt the data-keys), 
stored in rgw_crypt_sse_s3_vault_prefix.
- and it creates a per-object data-key, to encrypt the individual uploaded 
objects, stored encrypted in the objectmetada

In order to do this, ceph depends on hashicorp vault’s transit engine, which 
supports exactly this master-key/data-key scenario.

In contrast to this, the somewhat older implementation of SSE-KMS lacks this 
support of individual data-keys per object. It even lacks the support of an 
„undefined“ key-id - which is a totally fine use-case in Amazon-S3.

Now, since the new SSE-S3-implementation is done, i would like to ask if it 
would be possible to rewrite/enhance the SSE-KMS-implementation (at least when 
combined with vault’s transit engine) to behave like the SSE-S3-implementation 
(in terms of master-key/data-key, and in terms of generating it’s own 
per-bucket-master-key when no key-id is given).

This way, the implementation would be nearly identical to the design 
specification of Amazon S3, and it could be 100% backwards compatible without 
impact for existing setups and already stored data. As an implementation note, 
the „new“ implementation for KMS would simply need to use the same 
functionality / code as the SSE-S3 implementation - and extended to support 
both use-cases with a given key-id and an undefined one.

So, in pseudo-code, the kms-implementation could be like this:

- no key-id given:
currently, it throws an unsupported operation exception. In the future, it 
simply could do the same magic as with S3 (at least when combined with vault 
transit): get (or create a new one on the first request) the per-bucket-key 
(stored in rgw_crypt_vault_prefix - this is the difference to SSE-S3). Then, go 
on as if the key-id was given.

- key-id given in the request:
currently, it pulls the key by id from vault, and encrypts the data. In the 
future, it could create a new data-key based on the given key-id, and use this 
to encrypt the data (exactly as it is in case of SSE-S3). 
In case of not having vault’s transit engine (e.g. pv/pv2-engine, or other 
crypt backend not supporting data-keys), simply continue with the old behavior: 
pull the key and encrypt the data. 
In case of an already stored object: check the object metadata if there is an 
data-key stored alongside: then use the SSE-S3-iike workflow of decrypting the 
data-key and then decryption the object data. If there is no data-key alongside 
the object-metadata, then there should be the „old-workflow“ key-id stored. In 
this case, use the old workflow of pulling the key from vault, and use this to 
decrypt the data.

The changes would not be to complex, and the gains would be that ceph always 
uses a master-key/data-key (instead of just „the key“ given by key-id), and it 
would add the implementation of SSE-KMS without a given key-id (amazon calls 
this SSE-KMS with customer provided key (when the key-id is given) or SSE-KMS 
with amazon managed key (when there is no key-id given) - in both cases the 
user’s vault will be used to store/retrieve the master-keys, in contrast to 
amazons own internal vault in case of SSE-S3).

I would like to help here with ideas.

Best
Stefan


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] SSE-KMS vs SSE-S3 with per-object-data-keys

Reply via email to