[ https://issues.apache.org/jira/browse/HIVE-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777416#comment-13777416 ]
Larry McCay commented on HIVE-5207: ----------------------------------- Hi Jerry - I have taken a high level look through the patch. Lots of good stuff there - good work! A couple things that I would like to see more javadocs on and perhaps a document that describe the usecases: 1. TwoTieredKey - exactly the purpose, how it's used what the tiers are, etc 2. External KeyManagement integration - where and what is the expected contract for this integration 3. A specific usecase description for exporting keys into an external keystore and who has the authority to initiate the export and where the password comes from 4. An explanation as to why we should ever store the key with the data which seems like a bad idea. I understand that it is encrypted with the master secret - which takes me to the next question. :) 5. Where is the master secret established and stored and how is it protected There is a minor typo/spelling error that you probably want to fix now rather than later: +public interface HiveKeyResolver { + void init(Configuration conf) throws CryptoException; + + /** + * Resolve the key meta information of a table + * @param tableDesc The table descriptor + */ + KeyMeta resovleKey(TableDesc tableDesc); +} change resovleKey to resolveKey here and in the interface implementation and consumer of the method - I think there were 3 instances. Again, nice work here! Let's get some higher level descriptions in code javadocs and/or separate documents. Thanks! > Support data encryption for Hive tables > --------------------------------------- > > Key: HIVE-5207 > URL: https://issues.apache.org/jira/browse/HIVE-5207 > Project: Hive > Issue Type: New Feature > Affects Versions: 0.12.0 > Reporter: Jerry Chen > Labels: Rhino > Attachments: HIVE-5207.patch > > Original Estimate: 504h > Remaining Estimate: 504h > > For sensitive and legally protected data such as personal information, it is > a common practice that the data is stored encrypted in the file system. To > enable Hive with the ability to store and query the encrypted data is very > crucial for Hive data analysis in enterprise. > > When creating table, user can specify whether a table is an encrypted table > or not by specify a property in TBLPROPERTIES. Once an encrypted table is > created, query on the encrypted table is transparent as long as the > corresponding key management facilities are set in the running environment of > query. We can use hadoop crypto provided by HADOOP-9331 for underlying data > encryption and decryption. > > As to key management, we would support several common key management use > cases. First, the table key (data key) can be stored in the Hive metastore > associated with the table in properties. The table key can be explicit > specified or auto generated and will be encrypted with a master key. There > are cases that the data being processed is generated by other applications, > we need to support externally managed or imported table keys. Also, the data > generated by Hive may be consumed by other applications in the system. We > need to a tool or command for exporting the table key to a java keystore for > using externally. > > To handle versions of Hadoop that do not have crypto support, we can avoid > compilation problems by segregating crypto API usage into separate files > (shims) to be included only if a flag is defined on the Ant command line > (something like –Dcrypto=true). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira