Hi Matt,

Thanks for putting this proposal together! It all seems reasonable to me. I
just have a few questions and comments about scope and use:

   - Encrypted Iceberg metadata is out of scope?
   - Authentication tags are out of scope? (like those used in Parquet)
   - I think one requirement should be that Iceberg doesn’t necessarily
   leak the association of data files to keys. In that case, I’d prefer an
   opaque byte array of “key metadata” instead of the existing struct. That
   allows encrypting the key metadata later to avoid the leak.
   - Using an opaque byte array would also support storing more than one
   encryption key reference for per-column encryption. If that were done, the
   key returned by the get/put API might need to be more flexible.
   - This should also describe how to pass the key metadata to file formats
   for those that support encryption (or explicitly state that’s out of scope)
   - I’d like a little more detail on how this could look up keys on the
   driver and distribute them to tasks safely to avoid the thundering herd
   problem on the key server

Thanks!

rb

On Wed, Dec 12, 2018 at 11:44 AM Matt Cheah <mch...@palantir.com> wrote:

> Hi everyone,
>
>
>
> Encrypting data written to Iceberg tables is crucial for using this
> technology securely in industry settings. Towards that end, I’ve proposed
> an API for supporting encryption, including how users can implement their
> own custom encryption key providers and the metadata we’ll need to store in
> manifests.
>
>
>
> You can find the full spec here:
> https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit
>
>
>
> The GitHub ticket tracking this is here:
> https://github.com/apache/incubator-iceberg/issues/20
>
>
>
> Feel free to provide feedback in comments on the document.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to