
Takanobu Asanuma commented on HIVE-11527:

Hi, [~sershe], [~vgumashta], and other experts.

I uploaded a new patch in Review Board just now. I think I have almost finished 
implementing the features. So I'd like to explain the summary of all my 

*How to use the bypass*
When {{hive.server2.webhdfs.bypass.enabled}} is true, users can use the bypass. 
The default is false.

I added some unit tests in {{TestJdbcWithMiniHS2}}, {{TestJdbcWithMiniMr}} and 
{{TestJdbcWithMiniHA}}. They will help debugging.

*Changing thrift API*
I added three optional variables as the response from HS2 to JDBC drivers after 
executing a query.
* {{finalDirUri}}: a pass of the directory which has the final data
* {{haConf}}: configurations for Namenode HA
* {{typeName}}: a type name for complex columns

*Decoding data*
Decoding data in clients side is implemented in {{HiveQueryResultSet}}. In the 
latest patch, to avoid complex codes, clients can use the bypass only when the 
final data is SequenceFile which is the default format of final data. I think 
it is rare that clients change the default format.

*Handling HA*
When Namenode is HA, clients need some configurations which are in the cluster 
side. They are passed in {{Driver#getFinalDirName}}.

*Unable to use the bypass*
In some cases, it is difficult to use the bypass. I wrote the cases in 
{{TestJdbcWithMiniHS2#testUnableUseBypassCase}}. {{Driver#useBypass}} judges 
whether clients use the bypass.

Some optimizations and bugs may remain. Please review the patch when you are 
Thank you very much for reading this long comment!

> bypass HiveServer2 thrift interface for query results
> -----------------------------------------------------
>                 Key: HIVE-11527
>                 URL: https://issues.apache.org/jira/browse/HIVE-11527
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>            Reporter: Sergey Shelukhin
>            Assignee: Takanobu Asanuma
>         Attachments: HIVE-11527.WIP.patch
> Right now, HS2 reads query results and returns them to the caller via its 
> thrift API.
> There should be an option for HS2 to return some pointer to results (an HDFS 
> link?) and for the user to read the results directly off HDFS inside the 
> cluster, or via something like WebHDFS outside the cluster
> Review board link: https://reviews.apache.org/r/40867

This message was sent by Atlassian JIRA

Reply via email to