[GitHub] mccheah edited a comment on issue #20: Encryption in Data Files

GitBox Fri, 30 Nov 2018 14:21:15 -0800

mccheah edited a comment on issue #20: Encryption in Data Files
URL: 
https://github.com/apache/incubator-iceberg/issues/20#issuecomment-443356982
 
 
   I was considering using Palantir's [hadoop-crypto 
library](https://github.com/palantir/hadoop-crypto) to do the actual encryption 
portion of things. What do you think about this package?
   
   Column encryption is interesting; on our side we haven't explored this yet, 
and thus would not really be able to handle per-column encryption, and need to, 
in the meantime, only encrypt at the top file layer. That is to say, our 
internal storage solution doesn't handle storing multiple keys to decrypt 
different portions of the same file. You'll also notice this as such in the 
hadoop-crypto library. So whatever solution we come up with should be able to 
handle a full file encryption _or_ a per-column encryption. I suppose though a 
file would only be able to be encrypted one way or the other way strictly; if 
we encrypt the whole file, you more or less lose all the benefits of per-column 
encryption.
   
   Additionally a key part of performance is reducing the number of round trips 
made to the key storage backend, particularly if the backend supports batch 
operations. So it's ideal if the `KeyManager` could support getting and putting 
multiple keys at once, as well as implementing the Spark Data Source and other 
Iceberg clients to contact the backend as few times as possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] mccheah edited a comment on issue #20: Encryption in Data Files

Reply via email to