Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tomas Vondra Sat, 13 Jul 2019 14:58:32 -0700

On Sat, Jul 13, 2019 at 02:41:34PM -0400, Joe Conway wrote:

On 7/13/19 9:38 AM, Joe Conway wrote:

On 7/11/19 9:05 PM, Bruce Momjian wrote:

On Thu, Jul 11, 2019 at 08:41:52PM -0400, Joe Conway wrote:

On 7/11/19 6:37 PM, Bruce Momjian wrote:
> Our first implementation will encrypt the entire cluster.  We can later
> consider encryption per table or tablespace.  It is unclear if
> encrypting different parts of the system with different keys is useful
> or feasible.  (This is separate from key rotation.)


I still object strongly to using a single key for the entire database. I
think we can use a single key for WAL, but we need some way to split the
heap so that multiple keys are used. If not by tablespace, then some
other method.


What do you base this on?


Ok, so here we go. See links below. I skimmed through the entire thread
and FWIW it was exhausting.

To some extent this degenerated into a general search for relevant
information:

---
[1] and [2] show that at least some file system encryption uses a
different key per file.
---
[2] also shows that file system encryption uses a KDF (key derivation
function) which we may want to use ourselves. The analogy would be
per-table derived key instead of per file derived key. Note that KDF is
a safe way to derive a key and it is not the same as a "related key"
which was mentioned on another email as an attack vector.
---
[2] also says provides additional support for AES 256. It also mentions
CBC versus XTS -- I came across this elsewhere and it bears discussion:

"Currently, the following pairs of encryption modes are supported:

   AES-256-XTS for contents and AES-256-CTS-CBC for filenames
   AES-128-CBC for contents and AES-128-CTS-CBC for filenames
   Adiantum for both contents and filenames

If unsure, you should use the (AES-256-XTS, AES-256-CTS-CBC) pair.

AES-128-CBC was added only for low-powered embedded devices with crypto
accelerators such as CAAM or CESA that do not support XTS."
---
[2] also states this, which again makes me think in terms of table being
the moral equivalent to a file:

"Unlike dm-crypt, fscrypt operates at the filesystem level rather than
at the block device level. This allows it to encrypt different files
with different keys and to have unencrypted files on the same
filesystem. This is useful for multi-user systems where each user’s
data-at-rest needs to be cryptographically isolated from the others.
However, except for filenames, fscrypt does not encrypt filesystem
metadata."
---
[3] suggests 68 GB per key and unique IV in GCM mode.
---
[4] specifies 68 GB per key and unique IV in CTR mode -- this applies
directly to our proposal to use CTR for WAL.
---
[5] has this to say which seems independent of mode:

"When encrypting data with a symmetric block cipher, which uses blocks
of n bits, some security concerns begin to appear when the amount of
data encrypted with a single key comes close to 2n/2 blocks, i.e. n*2n/2
bits. With AES, n = 128 (AES-128, AES-192 and AES-256 all use 128-bit
blocks). This means a limit of more than 250 millions of terabytes,
which is sufficiently large not to be a problem. That's precisely why
AES was defined with 128-bit blocks, instead of the more common (at that
time) 64-bit blocks: so that data size is practically unlimited."


FWIW I was a bit confused at first, because the copy paste mangled the
formulas a bit - it should have been 2^(n/2) and n*2^(n/2).

But goes on to say:
"I wouldn't use n*2^(n/2) bits in any sort of recommendation. Once you
reach that number of bits the probability of a collision will grow
quickly and you will be way over 50% probability of a collision by the
time you reach 2*n*2^(n/2) bits. In order to keep the probability of a
collision negligible I recommend encrypting no more than n*2^(n/4) bits
with the same key. In the case of AES that works out to 64GB"

It is hard to say if that recommendation is per key or per key+IV.


Hmm, yeah. The question is what collisions they have in mind? Presumably
it's AES(block1,key) = AES(block2,key) in which case it'd be with fixed
IV, so per key+IV.

---
[6] shows that Azure SQL Database uses AES 256 for TDE. It also seems to
imply a single key is used although at one point it says "transparent
data encryption master key, also known as the transparent data
encryption protector". The term "master key" indicates that they likely
use derived keys under the covers.
---
[7] is generally useful read about how many of the things we have been
discussing are done in SQL Server
---
[8] was referenced by Sehrope. In addition to support for AES 256 for
long term use, table 5.1 is interesting. It lists CBC mode as "legacy"
but not "future".
---
[9] IETF RFC for KDF
---
[10] IETF RFC for Key wrapping -- this is probably how we should wrap
the master key with the Key Encryption Key (KEK) -- i.e. the outer key
provided by the user or command on postmaster start
---

Based on all of that I cannot find a requirement that we use more than
one key per database.

But I did find that files in an encrypted file system are encrypted with
derived keys from a master key, and I view this as analogous to what we
are doing.


My understanding always was that we'd do something like that, i.e. we'd
have a master key (or perhaps multiple of them, for various users), but
the data would be encrypted with secondary (generated) keys, and those
secondary keys would be encrypted by the master key. At least that's
what was proposed at the beginning of this thread by Insung Moon.

But AFAICS the 2-tier key scheme is primarily motivated by operational
reasons, i.e. effort to rotate the master key etc. So I would not expect
to find recommendations to use multiple keys in sources primarily
dealing with cryptography.

One extra thing we should consider is authenticated encryption. We can't
just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
as that does not provide integrity protection (i.e. can't detect when
the ciphertext was corrupted due to disk failure or intentionally). And
we can't quite rely on checksums, because that checksums the plaintext
and is stored encrypted.

Which seems pretty annoying, because then the checksums won't verify
data as sent to the storage system, and verify checksums would require
access to all keys (how do you do that in offline mode?).

But the main issue with checksum-then-encrypt is it's essentially
"MAC-then-Encrypt" and that does not provide Authenticated Encryption
security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
in which case we'll need to store the MAC somewhere (probably in the
same place as the nonce/IV/key/... for each page).

I've also stumbled upon [2], which is a nice doctoral thesis about disk
encryption - in particular chapter 4 is a nice overview of the threat
model and use cases. That guy also had a nice talk at FOSDEM 2018 about
data dm-integrity etc. [3]

[1] 
https://www.cosic.esat.kuleuven.be/school-iot/slides/AuthenticatedEncryptionII.pdf

[2] https://is.muni.cz/th/vesfr/final.pdf

[3] https://ftp.fau.de/fosdem/2018/Janson/cryptsetup.mp4


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Reply via email to