omalley edited a comment on issue #20: Encryption in Data Files URL: https://github.com/apache/incubator-iceberg/issues/20#issuecomment-443363218 I understand that column encryption take file format support and that isn't available yet, although it will be available for ORC soon. I haven't looked at the details of Palantir's hadoop-crypto library, but the approach looks good. For per-file encryption, I would: * Define the key name in the Iceberg table metadata. * When writing, call the KMS to generate a random local key and the corresponding encrypted bytes. Create a random IV for each file. When you update the manifest for the file, add the key name, key version, encryption algorithm, iv, and encrypted local key. * When reading, use the metadata from the manifest to have the KMS decrypt the key for that file. Use the decrypted key and iv to decrypt the file as needed. The relevant features: * The master key stays in the KMS and is never given to the user. * There is only one trip per a file to the KMS during reading or writing. * The encryption never reuses a local key/iv pair. Reuse of those pairs is very very bad. * If the user keeps a local key, it can only be used to decrypt that file. * Rolling new versions of master keys is supported.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services