Hi all,

We have briefly discussed this subject in a June sync, with a decision to
continue via the mailing list.
There are a number of pull requests from Jack and myself that implement a
set of disjoint elements from the high-level design
<https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit?usp=sharing>.
Some low-level details, such as generation and propagation of data keys,
are not covered in this document.
I have created a short (and hopefully simple) doc

https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing
 that focuses on these details and describes the bottom-up approach to
generation of data keys, encryption of data/delete files, and
options/phases for optimization of key management. The scope of the
document is intentionally narrow, and currently focuses on the minimal
simplest option. Reviews are very welcome. Later, this doc will be merged
in (or referenced from) the master design document.

A PR with a basic encryption DDL has been sent recently by Huaxin, you can
find it here <https://github.com/apache/iceberg/pull/3013>. Next week, I'll
send a pull request with an implementation of the minimal encryption
option. This pull request collects the basics from my PRs 2639, 2638, 2640
and Jack's PR 2443; adding the key generation and other code that creates
an end-to-end implementation of the minimal design
<https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing>.
This PR comes with an example proposed by Ryan - using a table encryption
key from a keyfile ("pkcs12" format - the closest thing to the "pem" format
for symmetric keys).
Besides the minimal version, I have a draft implementation of more advanced
data encryption options (including per-column keys, double wrapping and
two-tier management - all described in the master design doc) - but let's
take this one step at a time, starting with the simplest option.

Cheers, Gidon

Reply via email to