[ https://issues.apache.org/jira/browse/HIVE-18338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amruth S updated HIVE-18338: ---------------------------- Description: This exposes async API in HiveStatement (jdbc module) The JDBC interface always have had strict synchronous APIs. So the hive JDBC implementation also had to follow it though the hive server is fully asynchronous. Developers trying to build proxies on top of hive servers end up writing thrift client from scratch to make it asynchronous and robust to its restarts. The common pattern is # Submit query, get operation handle and store in a persistent store # Poll and wait for completion # Stream results # In the event of restarts, restore OperationHandle from persistent store and continue execution. The patch does 2 things * exposes operation handle (once a query is submitted) {{getOperationhandle()}} Developers can persist this along with the actual hive server url {{getJdbcUrl}} * latch APIs Developers can create a statement and latch on to an operation handle that was persisted earlier. For latch, the statement should be created from the actual hive server URI connection in which the query was submitted. was: Lot of users are struggling and rewriting a lot of boiler plate over thrift to get pure asynchronous capability. The idea is to expose operation handle, so that clients can persist it and later can latch on to the same execution. *Problem statement* Hive JDBC currently exposes 2 methods related to asynchronous execution *executeAsync()* - to trigger a query execution and return immediately. *waitForOperationToComplete()* - which waits till the current execution is complete *blocking the user thread*. This has one problem If the client process goes down, there is no way to resume queries although hive server is completely asynchronous. *Proposal* If operation handle could be exposed, we can latch on to an active execution of a query. *Code changes* Operation handle is exposed. So client can keep a copy. latchSync() and latchAsync() methods take an operation handle and try to latch on to the current execution in hive server if present > [Client, JDBC] Expose async interface through hive JDBC. > -------------------------------------------------------- > > Key: HIVE-18338 > URL: https://issues.apache.org/jira/browse/HIVE-18338 > Project: Hive > Issue Type: Improvement > Components: Clients, JDBC > Affects Versions: 2.3.2 > Reporter: Amruth S > Assignee: Amruth S > Priority: Minor > Labels: pull-request-available > Attachments: HIVE-18338.patch, HIVE-18338.patch.1, > HIVE-18338.patch.2, HIVE-18338.patch.3 > > > This exposes async API in HiveStatement (jdbc module) > The JDBC interface always have had strict synchronous APIs. > So the hive JDBC implementation also had to follow it though the hive server > is fully asynchronous. > Developers trying to build proxies on top of hive servers end up writing > thrift client from scratch to make it asynchronous and robust to its restarts. > The common pattern is > # Submit query, get operation handle and store in a persistent store > # Poll and wait for completion > # Stream results > # In the event of restarts, restore OperationHandle from persistent store > and continue execution. > The patch does 2 things > * exposes operation handle (once a query is submitted) > {{getOperationhandle()}} > Developers can persist this along with the actual hive server url > {{getJdbcUrl}} > * latch APIs > Developers can create a statement and latch on to an operation handle that > was persisted earlier. For latch, the statement should be created from the > actual hive server URI connection in which the query was submitted. -- This message was sent by Atlassian JIRA (v7.6.3#76005)