[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergio Peña updated HIVE-8065: ------------------------------ Sprint: (was: Sprint - Jan - Mar) > Support HDFS encryption functionality on Hive > --------------------------------------------- > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement > Affects Versions: 0.13.1 > Reporter: Sergio Peña > Assignee: Sergio Peña > Labels: Hive-Scrum > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)