Hello Stephen,
I'm unsure about what is the "common use case". ISTM that having the
postgres process holding the master key looks like a "no" for me in any use
case with actual security concern: as the database must be remotely
accessible, a reasonable security assumption is that a pg account could be
compromised, and the "master key" (if any, that is just one particular
cryptographic design) should not be accessible in that case. The first
barrier would be pg admin account. With external, the options are that the
key is hold by another id, so the host must be compromised as well, and
could even be managed remotely on another host (I agree that this later
option would be fragile because of the implied network connection, but it
would be a tradeoff depending on the security concerns, but it pretty much
correspond to the kerberos model).
No, this isn't anything like the Kerberos model and there's no case
where the PG account won't have access to the DEK (data encryption key)
during normal operation (except with the possibility of off-loading to a
hardware crypto device which, while interesting, is definitely something
that's of very limited utility and which could be added on as a
capability later anyway. This is also well understood by those who are
interested in this capability and it's perfectly acceptable that PG will
have, in memory, the DEK.
I'm sorry that I'm not very clear. My ranting is having a KEK "master key"
in pg memory. I think I'm fine at this point with having DEKs available on
the host to code/decode files. What I meant about kerberos is that the
setup I was describing was making the database dependent on a remote host,
which is somehow what keroberos does.
The keys which are stored on disk with the proposed patch are encrypted
using a KEK which is external to PG- that's all part of the existing
patch and design.
Yep. My point is that the patch hardwires a cryptographic design with many
assumptions, whereas I think it should design an API compatible with a
larger class of designs, and possibly implement one with KEK and so on,
I'm fine with that.
Adding a pre-defined system will not prevent future work on a
completely external option.
Yes and no.
Having two side-by-side cluster-encryption scheme in core, the first that
could be implemented on top of the second without much effort, would look
pretty weird. Moreover, required changes in core to allow an API are the
very same as the one in this patch. The other concern I have with doing the
cleaning work afterwards is that the API would be somehow in the middle of
the existing functions if it has not been thought of before.
This has been considered and the functions which are proposed are
intentionally structured to allow for multiple implementations already,
Does it? This is unclear to me from looking at the code, and the limited
documentation does not point to that. I can see that the "kek provider"
and the "random provider" can be changed, but the overall cryptographic
design seems hardwired.
so it's unclear to me what the argument here is.
The argument is that professional cryptophers do wrong designs, and I do
not see why you would do different. So instead of hardwiring choice that
you think are THE good ones, ISTM that pg should design a reasonably
flexible API, and also provide an implementation, obviously.
Further, we haven't even gotten to actual cluster encryption yet- this
is all just the key management side of things,
With hardwired choices about 1 KEK and 2 DEK that are debatable, see
below.
which is absolutely a necessary and important part but argueing that it
doesn't address the cluster encryption approach in core simply doesn't
make any sense as that's not a part of this patch.
Let's have that discussion when we actually get to the point of talking
about cluster encryption.
I know archive_command is completely external, but that is much simpler
than what would need to be done for key management, key rotation, and key
verification.
I'm not sure of the design of the key rotation stuff as I recall it from the
patches I looked at, but the API design looks like the more pressing issue.
First, the mere existence of an "master key" is a cryptographic choice which
is debatable, because it creates issues if one must be able to change it,
hence the whole "key rotation" stuff. ISTM that what core needs to know is
that one particular key is changed by a new one and the underlying file is
rewritten. It should not need to know anything about master keys and key
derivation at all. It should need to know that the user wants to change file
keys, though.
No, it's not debatable that a KEK is needed, I disagree entirely.
ISTM that there is cryptographic design which suits your needs and you
seem to decide that it is the only possible way to do it It seems that you
know. As a researcher, I know that I do not know:-) As a software
engineer, I'm trying to suggest enabling more with the patch, including
not hardwiring a three-key management scheme.
I'm advocating that you pg not decide for others what is best for them,
but rather allow multiple choices and implementations through an API. Note
that I'm not claiming that the 1 KEK + 2 DEK + AES… design is bad. I'm
claiming that it is just one possible design, quite peculiar though, and
that pg should not hardwire it, and it does not need to anyway.
The documentation included in the key management patch states: "Key zero
is the key used to encrypt database heap and index files which are stored
in the file system, plus temporary files created during database
operation." Read as is, it suggests that the same key is used to encrypt
many files. From a cryptographic point of view this looks like a bad idea.
The mode used is unclear. If this is a stream cypher generated in counter
mode, it could be a very bad idea. Hopefully this is not what is intended,
but the included documentation is frankly misleading in that case. IMHO
there should be one key per file. Also, the CTR mode should be avoided if
possible because it has quite special properties, unless these properties
are absolutely necessary.
From what I see in the patch, the DEKs cannot be changed, only the KEK
which encodes de DEK. I cannot say I'm thrilled by such a property. If the
KEK is lost, then probably the DEKs are lost as well. Changing the KEK
does not change anything about it, so that the attacker would be able to
retrieve the contents of later/previous state of the database if they can
retrieve a previous/later encoded version, even if the KEK is changed
every day. This is a bad property which somehow provides a false sense of
security. I'm not sure that it makes much sense to change a KEK without
changing the associated DEK, security-wise.
Anyway, I'd suggest to provide a script to re-encrypt a cluster, possibly
while offline, with different DEKs. From an audit perspective, this could
be a requirement.
I'd expect a very detailed README or documentation somewhere, but there
are none in the committed/uncommitted patch.
In https://wiki.postgresql.org/wiki/Transparent_Data_Encryption, I see a
lot of minute cryptographic design decision which are debatable, and a few
which seem to be still opened, whereas other things are already in the
process of being committed. I'm annoyed by the decision to reuse the same
key for a lot of encryption, even if there are discussions about changing
the iv in some way. If you have some clever way to build an iv, then
please please change the key as well?
ISTM that pg at the core level should (only) directly provide:
(1) a per-file encryption scheme, with loadable (hook-replaceable
functions??) to manage pages, maybe:
encrypt(page_id, *key, *clear_page, *encrypted_page);
decrypt(page_id, *key, *encrypted_page, *clear_page);
What encrypt/decrypt does is beyond pg core stuff. Ok, a reasonable
implementation should be provided, obviously, possibly in core. There may
be some block-size issues if not full pages are encrypted, so maybe the
interface needs to be a little more subtle.
(2) offer a key management scheme interface, to manage *per-file* keys,
possibly loadable (hook replaceable?). If all files have the same key,
which is stored in a directory and encoded with a KEK, this is just one
(debatable) implementation choice. For that, ISTM that what is needed at
this level is:
get_key(file_id (relative name? oid? 8 or 16 bytes something?));
Maybe that should be enough to create a new key implicitely, or maybe it
should be a different interface, eg get_new_key(file_id), not sure. There
is the question of the key length, which may for instance include an iv.
Maybe an information seeking function should be provided, to know about
sizes or whatever, not sure. Maybe there should/could be a parameter to
manage key versions, not sure.
The internal key infrastructure would interact with this/these functions
when needed, and store the key in the file management structure. All this
should be a black box to core pg, that is the point of an API. This is
what I'd have expected to see in the "postgres core key management" stuff,
not a hardwired 2-key store with a KEK.
(3) ISTM that the key management interface should be external, or at least
it should be possible to make it external easily. I do not think that
there is a significant performance issue because keys are needed once, and
once loaded they are there. A simple way to do that is a separate process
with a basic protocol on stdin/stdout to implement "get_key", which is
basically already half implemented in the patch for retrieving the KEK.
Perhaps we can have support for an external key store in the future for
the DEKs, but we really need to have our own key management and the
ability to reasonably rotate keys (which is what the KEK provides).
I think that the KEK/2-DEK store is a peculiar design which does not
*need* to be hardwired deeply in core.
Ideally, we'll get to a point where we can even have multiple DEKs and
deal with rotating them too, but that, if anything, point even stronger
to the need to have a key management system and KEK as we'll be dealing
with that many more keys.
I will say that if the community feels external-only should be the only
option, I will stop working on this feature because I feel the result
would be too fragile to be reliable,
I'm do not see why it would be the case. I'm just arguing to have key
management in a separate, possibly suid something-else, process, which given
the security concerns which dictates the feature looks like a must have, or
at least must be possible. From a line count point of view, it should be a
small addition to the current code.
All of this hand-waving really isn't helping.
I'm sorry that you feel that. I'm really trying to help improve the
approach and design with my expertise. As I said in another mail, I'm no
one, you can always chose to ignore me.
If it's a small addition to the current code then it'd be fantastic if
you'd propose a specific patch which adds what you're suggesting.
My point is to discuss an API.
Once there is some agreement the possible API outline, then
implementations can be provided, but if people do not want an API in the
first place, I'm not sure I can see the point of sending patches.
I don't think either Bruce or I would have any issue with others helping
out on this effort, but let's be clear- we need something that *is* part
of core PG, even if we have an ability to have other parts exist outside
of PG.
Yep. I have not suggested that PG should not implement the API, on the
contrary it definitely should.
--
Fabien.