[ https://issues.apache.org/jira/browse/HIVE-19821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532142#comment-16532142 ]
Kai Zheng commented on HIVE-19821: ---------------------------------- Hi Sahil, This is a nice proposal. Having the heavy HS2 taking be separated out to container level for isolation and scalability looks promising. Not digging into the details yet, some questions: # In HoS, the spark context resides in separate JVM, now since the main work of HS2 is to be in a container/JVM per session/user, would it be good to combine the two together, consolidating the spark context back into the new HS2 container? For efficiency. # I like the architect picture in [Apache Livy|https://livy.incubator.apache.org/], would be good to have some similar one in the design. # Will this approach affect security, like auth and authorization? # Considerations like backward compatibility and interfaces/tools. Thank you. > Distributed HiveServer2 > ----------------------- > > Key: HIVE-19821 > URL: https://issues.apache.org/jira/browse/HIVE-19821 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 > Reporter: Sahil Takiar > Assignee: Sahil Takiar > Priority: Major > Attachments: HIVE-19821.1.WIP.patch, HIVE-19821.2.WIP.patch, > HIVE-19821_ Distributed HiveServer2.pdf > > > HS2 deployments often hit OOM issues due to a number of factors: (1) too many > concurrent connections, (2) query that scan a large number of partitions have > to pull a lot of metadata into memory (e.g. a query reading thousands of > partitions requires loading thousands of partitions into memory), (3) very > large queries can take up a lot of heap space, especially during query > parsing. There are a number of other factors that cause HiveServer2 to run > out of memory, these are just some of the more commons ones. > Distributed HS2 proposes to do all query parsing, compilation, planning, and > execution coordination inside a dedicated container. This should > significantly decrease memory pressure on HS2 and allow HS2 to scale to a > larger number of concurrent users. > For HoS (and I think Hive-on-Tez) this just requires moving all query > compilation, planning, etc. inside the application master for the > corresponding Hive session. > The main benefit here is isolation. A poorly written Hive query cannot bring > down an entire HiveServer2 instance and force all other queries to fail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)