[jira] [Commented] (HIVE-19821) Distributed HiveServer2

Kai Zheng (JIRA) Tue, 03 Jul 2018 18:54:09 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532142#comment-16532142
 ]


Kai Zheng commented on HIVE-19821:
----------------------------------

Hi Sahil,

This is a nice proposal. Having the heavy HS2 taking be separated out to 
container level for isolation and scalability looks promising. Not digging into 
the details yet, some questions:
 # In HoS, the spark context resides in separate JVM, now since the main work 
of HS2 is to be in a container/JVM per session/user, would it be good to 
combine the two together, consolidating the spark context back into the new HS2 
container? For efficiency.
 # I like the architect picture in [Apache 
Livy|https://livy.incubator.apache.org/], would be good to have some similar 
one in the design.
 # Will this approach affect security, like auth and authorization?
 # Considerations like backward compatibility and interfaces/tools.

Thank you.

> Distributed HiveServer2
> -----------------------
>
>                 Key: HIVE-19821
>                 URL: https://issues.apache.org/jira/browse/HIVE-19821
>             Project: Hive
>          Issue Type: New Feature
>          Components: HiveServer2
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-19821.1.WIP.patch, HIVE-19821.2.WIP.patch, 
> HIVE-19821_ Distributed HiveServer2.pdf
>
>
> HS2 deployments often hit OOM issues due to a number of factors: (1) too many 
> concurrent connections, (2) query that scan a large number of partitions have 
> to pull a lot of metadata into memory (e.g. a query reading thousands of 
> partitions requires loading thousands of partitions into memory), (3) very 
> large queries can take up a lot of heap space, especially during query 
> parsing. There are a number of other factors that cause HiveServer2 to run 
> out of memory, these are just some of the more commons ones.
> Distributed HS2 proposes to do all query parsing, compilation, planning, and 
> execution coordination inside a dedicated container. This should 
> significantly decrease memory pressure on HS2 and allow HS2 to scale to a 
> larger number of concurrent users.
> For HoS (and I think Hive-on-Tez) this just requires moving all query 
> compilation, planning, etc. inside the application master for the 
> corresponding Hive session.
> The main benefit here is isolation. A poorly written Hive query cannot bring 
> down an entire HiveServer2 instance and force all other queries to fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-19821) Distributed HiveServer2

Reply via email to