Hi all, We have briefly discussed this subject in a June sync, with a decision to continue via the mailing list. There are a number of pull requests from Jack and myself that implement a set of disjoint elements from the high-level design <https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit?usp=sharing>. Some low-level details, such as generation and propagation of data keys, are not covered in this document. I have created a short (and hopefully simple) doc
https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing that focuses on these details and describes the bottom-up approach to generation of data keys, encryption of data/delete files, and options/phases for optimization of key management. The scope of the document is intentionally narrow, and currently focuses on the minimal simplest option. Reviews are very welcome. Later, this doc will be merged in (or referenced from) the master design document. A PR with a basic encryption DDL has been sent recently by Huaxin, you can find it here <https://github.com/apache/iceberg/pull/3013>. Next week, I'll send a pull request with an implementation of the minimal encryption option. This pull request collects the basics from my PRs 2639, 2638, 2640 and Jack's PR 2443; adding the key generation and other code that creates an end-to-end implementation of the minimal design <https://docs.google.com/document/d/19O_qiQumz_66CdWLpw38GFJEsUpnNxXckP9rnYIQnCo/edit?usp=sharing>. This PR comes with an example proposed by Ryan - using a table encryption key from a keyfile ("pkcs12" format - the closest thing to the "pem" format for symmetric keys). Besides the minimal version, I have a draft implementation of more advanced data encryption options (including per-column keys, double wrapping and two-tier management - all described in the master design doc) - but let's take this one step at a time, starting with the simplest option. Cheers, Gidon