[ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538548#comment-14538548
 ] 

Mostafa Mokhtar commented on HIVE-8065:
---------------------------------------

[~spena]

I am running on a cluster which has Namenode HA enabled and I can't run 
queries, wondering if Namenode HA was tested?

This is the call stack I am getting 
{code}
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1943)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1975)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1788)
        ... 25 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: 
hdfs://namenode:8020/apps/hive/warehouse/test_table, expected: hdfs://namenode
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
        at 
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
        at 
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1245)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1862)
        ... 28 more
{code}

I stepped through the debugger and found that the authorities mismatch 
thisAuthority "namenode" (id=86)
thatAuthority "namenode:8020" (id=99)

Then in FileSystem.checkPath when the authorities mismatch the code falls 
through and throw the exception 
{code}
     thatAuthority = uri.getAuthority();
        if (thisAuthority == thatAuthority ||       // authorities match
            (thisAuthority != null && 
             thisAuthority.equalsIgnoreCase(thatAuthority)))
          return;
      }
    }
    throw new IllegalArgumentException("Wrong FS: "+path+
                                       ", expected: "+this.getUri());
{code}

Is this an issue with HDFS encryption on Hive or you think this is a 
configuration issue?

On a related note I don't think Hive should be checking if the staging 
directory is encrypted if none of the Hive managed tables are encrypted.

> Support HDFS encryption functionality on Hive
> ---------------------------------------------
>
>                 Key: HIVE-8065
>                 URL: https://issues.apache.org/jira/browse/HIVE-8065
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 0.13.1
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>              Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to