Hi,

My proposal is not about backups, encrypted or otherwise, though I can see 
there's a relationship. Could the built-in encryption of my proposal also be 
suitable for protecting a backup of these files? Yes, I think so. Given key 
rotation we would expect to eventually have backups that need a wrapping key 
that is no longer the current one, hence the need we both perceive for multiple 
key slots. We differ only in that I pictured filling in the empty slots some 
time after file creation, and merely as a way to avoid a lock-step rotation.

You wondered if encryption should be optional. That's a good topic. In my view 
it's a "yes". Encryption is optional, admins should be able to configure 
encryption for any subset of databases, including none and all databases. It 
should be possible to configure CouchDB so that it unencrypts your databases 
(via compaction). It would also be useful if the wrapping key could vary 
between databases (it doesn't appear to be useful to go more granular than 
that). So perhaps it is DatabaseName in the callback functions and not 
WrappingKeyId.

I agree that we'll need the ability to have multiple key slots. I hadn't 
considered that we'd fill more than one slot at couch_file creation time but I 
don't see why not. We can delegate that to the key manager;

-callback new_key(DatabaseName :: binary()) ->
    {ok, [WrappedKey :: binary()], UnwrappedKey :: binary()} | {error, Reason ::
term()}.

The key manager might send back a list of one item or several, and couch_file 
is simply obliged to record them at the start of the file. We would maybe also 
want to ensure there are empty slots available, so there might need to be a 
callback on the lines of;

-callback slot_size() -> pos_integer().

So we can know how much space to leave at the start of the file for empty slots.

The unwrap callback in this scheme would be essentially your revised proposal;

-callback unwrap_key(DatabaseName :: binary(), [WrappedKey::binary()]) ->
    {ok, UnwrappedKey :: binary() | {error, Reason :: term()}.

I am wary of adding any code path in couchdb where we write anywhere but the 
end of the file, so the actual process of filling in a preallocated empty slot 
will need more thought. The atomicity of disk writes in theory and practice 
come into play and will likely force some decisions. For example we might be 
obliged to round up to the nearest 4 KiB (or disk sector size of the storage 
device if we can retrieve that; though it's probably 4 KiB).

Another option is to store the wrapped keys in the db headers but this presents 
a few difficulties. couch_file itself has no idea what is in the headers, only 
that they are 4 KiB-aligned and have the magic bit set at the start that 
indicates it has found a real header. So there's a layering issue there, but I 
think we can solve that. The other issue, though, is that the header itself 
could not be encrypted. I have a strong preference for encrypting every byte of 
the file.

B.


> On 19 May 2022, at 11:17, Will Young <lostnetwork...@gmail.com> wrote:
> 
> On Wed, 18 May 2022, 19:31 Robert Newson, <rnew...@apache.org> wrote:
> 
>> Hi Will,
>> 
>> I still don't see how public/private encryption helps. It seems the
>> private keys in your idea(s) are needed in all the same places and for the
>> same duration as the secret keys in mine. I think it's clear, though, that
>> I didn't fully comprehend your post.
>> 
> 
> I'm a bit confused here, in the example the node(s) never get any access to
> backup's private key. A node would never need to know any other node's
> private key(s). The nodeN encrypted to its own and backup's public keys so
> they can each decrypt the shard key with their private key. If node1 were
> to lose it's own keystore or cease to exist, backup's token might finally
> be plugged in (i.e. to node1 or maybe a recent backup of node1's data
> volume is restored to a replacement host with new token) and then using
> backup's private key inside its token one can begin compacting shards (just
> as usual encrypting to a public key for node1's token and the one for
> backups.)
> 
> Once backup's token is unplugged again from this restore operation, new
> node1 would have no secrets for any past backups it's new key reads only
> these newly updated shards and its own updates to them. Therefore backups
> is holding a master key to restore the history of backups for any node or
> its replacement while each node has a key that could only read back into
> some period of its own backups and can be destroyed whenever we like (as
> long as we are willing to use the backup key).
> 
> I don't see any similar possibility with secured symmetric keys as a
> symmetric key being used as the wrapping key on a node means that either
> only that node has the secret key or the key is improperly secured, i.e.
> many nodes have access to that secret key by simply requesting it or its
> use over the network.) One can be sending around wrappings of these
> wrapping keys, etc, and one is more protocol steps away from the hardware
> secured keys.
> 
>> 
>> 
>> Perhaps it would help if I make a strawman proposal for a key manager
>> interface?
>> 
> 
> Yes, I think it would help me understand the intended direction.
> 
>> 
>> Before I do that, imagine a number of small changes to the branch of work
>> I've presented already;
>> 
>> 1. A key manager interface is defined.
>> 2. The concrete implementation of that interface can be defined for a
>> couchdb installation somehow (perhaps in vm.args or default.ini)
>> 3. That couch_file delegates to this interface at the points where it
>> currently needs to wrap or unwrap the file key.
>> 
>> An interface might look like this;
>> 
>> -callback new_wrapped_key(WrappingKeyId :: binary()) ->
>>    {ok, WrappedKey :: binary(), UnwrappedKey :: binary()} | {error,
>> Reason :: term()}.
> 
> 
> 
>> 
>> -callback unwrap_key(WrappingKeyId::binary(), WrappedKey::binary()) ->
>>    {ok, UnwrappedKey :: binary() | {error, Reason :: term()}.
>> 
>> couch_file would call new_wrapped_key when creating a new file, and would
>> receive the wrapped form, for writing to the header, and the unwrapped
>> form, for initialising the ciphers held in the state variable.
>> 
>> For existing files, couch_file would read the wrapped key from the file
>> and call unwrap_key to retrieve the unwrapped form, for the same purpose as
>> previous.
>> 
>> An implementation of this interface could be done in erlang, as I've
>> already shown, or could involve a remote network connection to some service
>> that does it for us (and, one hopes, does so over HTTPS).
>> 
>> So the questions I'm most interested in discussing are whether this is the
>> right level of abstraction and, if not, what others think would be?
>> 
> 
> 
> To me it would make more sense to work with the group of header slots and
> the configuration for them as if they were a block, very roughly.:
> 
> -callback new_shard_key(<< keyid1::binary(),  keyid2::binary() >>) ->
>  {ok, Slots :: binary(), UnwrappedKey :: binary()} | {error, Reason ::
> term()}.
> 
> -callback unwrap_shard_key(WrappingKeyId::binary(), Slots::binary()) ->   %
> search slots for shard key wrapped with this keyid and unwrap it
>    {ok, UnwrappedKey :: binary() | {error, Reason :: term()}.
> 
> My thinking is that:
> Even at initial creation with only symmetric keys, I think a shard should
> be able to have 2 wrapping keys that can decode it similar to our earlier
> discussion (i.e. one key is new and the desired key to keep for this month
> while the other is last months and definitely retrievable after a power
> outage reverting only system partitions, etc.)
> If for some reason we couldn't build all the currently required slots we
> don't actually want to create a new shard key. In creation we could only
> fail (unless encryption is optional), in compaction we may still want to do
> something but whatever we do, we need to keep using the old key so it is
> retrievable using the past slots until the problem is resolved.
> 
> Thoughts?
> Will

Reply via email to