[ 
https://issues.apache.org/jira/browse/HIVE-22193?focusedWorklogId=792995&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792995
 ]

ASF GitHub Bot logged work on HIVE-22193:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 20/Jul/22 02:35
            Start Date: 20/Jul/22 02:35
    Worklog Time Spent: 10m 
      Work Description: sourabh912 commented on code in PR #3386:
URL: https://github.com/apache/hive/pull/3386#discussion_r924864971


##########
service/src/java/org/apache/hive/service/CompositeService.java:
##########
@@ -94,6 +94,19 @@ public synchronized void stop() {
     super.stop();
   }
 
+  @Override
+  public synchronized void decommission() {

Review Comment:
   nit: add a documentation that services are decommissioned in the reverse 
order in which they were added. 



##########
service/src/java/org/apache/hive/service/server/HiveServer2.java:
##########
@@ -895,6 +897,60 @@ private void stopOrDisconnectTezSessions() {
     }
   }
 
+  /**
+   * Decommission HiveServer2. As a consequence, SessionManager stops
+   * opening new sessions, OperationManager refuses running new queries and
+   * HiveServer2 deregisters itself from Zookeeper if service discovery is 
enabled,
+   * but the decommissioning has no effect on the current running queries.
+   */
+  public synchronized void decommission() {
+    LOG.info("Decommissioning HiveServer2");
+    // Remove this server instance from ZooKeeper if dynamic service discovery 
is set
+    if (zooKeeperHelper != null && !isDeregisteredWithZooKeeper()) {
+      try {
+        zooKeeperHelper.removeServerInstanceFromZooKeeper();
+      } catch (Exception e) {
+        LOG.error("Error removing znode for this HiveServer2 instance from 
ZooKeeper.", e);
+      }
+    }
+    super.decommission();
+  }
+
+  public synchronized void graceful_stop() {
+    try {
+      decommission();
+      long maxTimeForWait = HiveConf.getTimeVar(getHiveConf(),
+              HiveConf.ConfVars.HIVE_SERVER2_GRACEFUL_STOP_TIMEOUT, 
TimeUnit.MILLISECONDS);
+      if (maxTimeForWait > 0) {
+        ExecutorService service = Executors.newSingleThreadExecutor();
+        Future future = service.submit(() -> {
+          // For gracefully stopping, sleeping some time while looping does 
not bring much overhead,
+          // that is, at most 100ms are wasted for waiting for 
OperationManager to be done,
+          // and this code path will only be executed when HS2 is being 
terminated.
+          long sleepInterval = Math.min(100, maxTimeForWait);
+          while (getCliService() != null && getCliService().getSessionManager()
+                  .getOperations().size() != 0) {

Review Comment:
   Should a better check be 
   ```
   while(getCliService() != null && getCliService().getSessionManager() != null 
&& getCliService().getSessionManager().getOperations().size() > 0) {}
   ```



##########
service/src/java/org/apache/hive/service/CompositeService.java:
##########
@@ -94,6 +94,19 @@ public synchronized void stop() {
     super.stop();
   }
 
+  @Override
+  public synchronized void decommission() {

Review Comment:
   Also return early if the service is already decommissioned? 
   
   ```
   if (this.getServiceState() == State.DECOMMISSIONING) {
     return
   }
   ```



##########
service/src/java/org/apache/hive/service/server/HiveServer2.java:
##########
@@ -895,6 +897,46 @@ private void stopOrDisconnectTezSessions() {
     }
   }
 
+  public synchronized void decommission() {
+    LOG.info("Decommissioning HiveServer2");
+    // Remove this server instance from ZooKeeper if dynamic service discovery 
is set
+    if (serviceDiscovery && !activePassiveHA && zooKeeperHelper != null) {
+      try {
+        zooKeeperHelper.removeServerInstanceFromZooKeeper();
+      } catch (Exception e) {
+        LOG.error("Error removing znode for this HiveServer2 instance from 
ZooKeeper.", e);
+      }
+    }
+    super.decommission();
+  }
+
+  public synchronized void graceful_stop() {
+    try {
+      decommission();
+      long maxTimeForWait = HiveConf.getTimeVar(getHiveConf(),
+              HiveConf.ConfVars.HIVE_SERVER2_GRACEFUL_STOP_TIMEOUT, 
TimeUnit.MILLISECONDS);
+      if (maxTimeForWait > 0) {
+        ExecutorService service = Executors.newSingleThreadExecutor();
+        Future future = service.submit(() -> {
+          while (getCliService() != null && getCliService().getSessionManager()

Review Comment:
   Thank you for addressing my comment. 
   Any particular reason why we need a separate thread to wait on operations to 
finish? Why can't it be done in same thread calling `graceful_stop` ? 



##########
service/src/java/org/apache/hive/service/AbstractService.java:
##########
@@ -125,7 +145,6 @@ public synchronized void stop() {
       // started (eg another service failing canceled startup)
       return;
     }
-    ensureCurrentState(STATE.STARTED);

Review Comment:
   Understood. Instead of removing the check, shall we change it to one 
mentioned in my previous comment?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 792995)
    Time Spent: 5h  (was: 4h 50m)

> Graceful Shutdown HiveServer2
> -----------------------------
>
>                 Key: HIVE-22193
>                 URL: https://issues.apache.org/jira/browse/HIVE-22193
>             Project: Hive
>          Issue Type: Improvement
>          Components: Server Infrastructure
>            Reporter: chenshiyun
>            Assignee: Zhihua Deng
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> We have a lot of HiveSever2 servers deployed on production environment (about 
> 10 nodes). 
> However, if we want to change configuration or add patches, we would have to 
> restart all of them one by one. So all the Hive Sql job running on the server 
> will be defeated, and there may be some mistakes come up on the jdbc client 
> occasionally.
> In the proposed changes,  planning to add Graceful Shutdown HiveSever2 method 
> to avoid affecting the production environment jobs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to