[ https://issues.apache.org/jira/browse/HIVE-22193?focusedWorklogId=792995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792995 ]
ASF GitHub Bot logged work on HIVE-22193: ----------------------------------------- Author: ASF GitHub Bot Created on: 20/Jul/22 02:35 Start Date: 20/Jul/22 02:35 Worklog Time Spent: 10m Work Description: sourabh912 commented on code in PR #3386: URL: https://github.com/apache/hive/pull/3386#discussion_r924864971 ########## service/src/java/org/apache/hive/service/CompositeService.java: ########## @@ -94,6 +94,19 @@ public synchronized void stop() { super.stop(); } + @Override + public synchronized void decommission() { Review Comment: nit: add a documentation that services are decommissioned in the reverse order in which they were added. ########## service/src/java/org/apache/hive/service/server/HiveServer2.java: ########## @@ -895,6 +897,60 @@ private void stopOrDisconnectTezSessions() { } } + /** + * Decommission HiveServer2. As a consequence, SessionManager stops + * opening new sessions, OperationManager refuses running new queries and + * HiveServer2 deregisters itself from Zookeeper if service discovery is enabled, + * but the decommissioning has no effect on the current running queries. + */ + public synchronized void decommission() { + LOG.info("Decommissioning HiveServer2"); + // Remove this server instance from ZooKeeper if dynamic service discovery is set + if (zooKeeperHelper != null && !isDeregisteredWithZooKeeper()) { + try { + zooKeeperHelper.removeServerInstanceFromZooKeeper(); + } catch (Exception e) { + LOG.error("Error removing znode for this HiveServer2 instance from ZooKeeper.", e); + } + } + super.decommission(); + } + + public synchronized void graceful_stop() { + try { + decommission(); + long maxTimeForWait = HiveConf.getTimeVar(getHiveConf(), + HiveConf.ConfVars.HIVE_SERVER2_GRACEFUL_STOP_TIMEOUT, TimeUnit.MILLISECONDS); + if (maxTimeForWait > 0) { + ExecutorService service = Executors.newSingleThreadExecutor(); + Future future = service.submit(() -> { + // For gracefully stopping, sleeping some time while looping does not bring much overhead, + // that is, at most 100ms are wasted for waiting for OperationManager to be done, + // and this code path will only be executed when HS2 is being terminated. + long sleepInterval = Math.min(100, maxTimeForWait); + while (getCliService() != null && getCliService().getSessionManager() + .getOperations().size() != 0) { Review Comment: Should a better check be ``` while(getCliService() != null && getCliService().getSessionManager() != null && getCliService().getSessionManager().getOperations().size() > 0) {} ``` ########## service/src/java/org/apache/hive/service/CompositeService.java: ########## @@ -94,6 +94,19 @@ public synchronized void stop() { super.stop(); } + @Override + public synchronized void decommission() { Review Comment: Also return early if the service is already decommissioned? ``` if (this.getServiceState() == State.DECOMMISSIONING) { return } ``` ########## service/src/java/org/apache/hive/service/server/HiveServer2.java: ########## @@ -895,6 +897,46 @@ private void stopOrDisconnectTezSessions() { } } + public synchronized void decommission() { + LOG.info("Decommissioning HiveServer2"); + // Remove this server instance from ZooKeeper if dynamic service discovery is set + if (serviceDiscovery && !activePassiveHA && zooKeeperHelper != null) { + try { + zooKeeperHelper.removeServerInstanceFromZooKeeper(); + } catch (Exception e) { + LOG.error("Error removing znode for this HiveServer2 instance from ZooKeeper.", e); + } + } + super.decommission(); + } + + public synchronized void graceful_stop() { + try { + decommission(); + long maxTimeForWait = HiveConf.getTimeVar(getHiveConf(), + HiveConf.ConfVars.HIVE_SERVER2_GRACEFUL_STOP_TIMEOUT, TimeUnit.MILLISECONDS); + if (maxTimeForWait > 0) { + ExecutorService service = Executors.newSingleThreadExecutor(); + Future future = service.submit(() -> { + while (getCliService() != null && getCliService().getSessionManager() Review Comment: Thank you for addressing my comment. Any particular reason why we need a separate thread to wait on operations to finish? Why can't it be done in same thread calling `graceful_stop` ? ########## service/src/java/org/apache/hive/service/AbstractService.java: ########## @@ -125,7 +145,6 @@ public synchronized void stop() { // started (eg another service failing canceled startup) return; } - ensureCurrentState(STATE.STARTED); Review Comment: Understood. Instead of removing the check, shall we change it to one mentioned in my previous comment? Issue Time Tracking ------------------- Worklog Id: (was: 792995) Time Spent: 5h (was: 4h 50m) > Graceful Shutdown HiveServer2 > ----------------------------- > > Key: HIVE-22193 > URL: https://issues.apache.org/jira/browse/HIVE-22193 > Project: Hive > Issue Type: Improvement > Components: Server Infrastructure > Reporter: chenshiyun > Assignee: Zhihua Deng > Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > We have a lot of HiveSever2 servers deployed on production environment (about > 10 nodes). > However, if we want to change configuration or add patches, we would have to > restart all of them one by one. So all the Hive Sql job running on the server > will be defeated, and there may be some mistakes come up on the jdbc client > occasionally. > In the proposed changes, planning to add Graceful Shutdown HiveSever2 method > to avoid affecting the production environment jobs -- This message was sent by Atlassian Jira (v8.20.10#820010)