[ https://issues.apache.org/jira/browse/HIVE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Chung updated HIVE-24713: -------------------------------- Summary: HS2 never shut down after reconnecting to Zookeeper (was: HS2 never knows deregistering from Zookeeper in the particular case) > HS2 never shut down after reconnecting to Zookeeper > --------------------------------------------------- > > Key: HIVE-24713 > URL: https://issues.apache.org/jira/browse/HIVE-24713 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Reporter: Eugene Chung > Assignee: Eugene Chung > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > While using zookeeper discovery mode, the problem that HS2 never knows > deregistering from Zookeeper always happens. > Reproduction is simple. > # Find one of the zk servers which holds the DeRegisterWatcher watches of > HS2 instances. If the version of ZK server is 3.5.0 or above, it's easily > found with [http://zk-server:8080/commands/watches] (ZK AdminServer feature) > # Check which HS2 instance is watching on the ZK server found at 1, say it's > _hs2-of-2_ > # Restart the ZK server found at 1 > # Deregister _hs2-of-2_ with the command > {noformat} > hive --service hiveserver2 -deregister hs2-of-2{noformat} > # _hs2-of-2_ never knows that it must be shut down because the watch event > of DeregisterWatcher was already fired at the time of 3. > The reason of the problem is explained at > [https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#sc_WatchRememberThese] > I added some logging to DeRegisterWatcher and checked what events were > occurred at the time of 3(restarting of ZK server); > # WatchedEvent state:Disconnected type:None path:null > # WatchedEvent[WatchedEvent state:SyncConnected type:None path:null] > # WatchedEvent[WatchedEvent state:SaslAuthenticated type:None path:null] > # WatchedEvent[WatchedEvent state:SyncConnected type:NodeDataChanged > path:/hiveserver2/serverUri=hs2-of-2:10000;version=3.1.2;sequence=0000000711] > As the zk manual says, watches are one-time triggers. When the connection to > the ZK server was reestablished, state:SyncConnected type:NodeDataChanged for > the path is fired and it's the end. *DeregisterWatcher must be registered > again for the same znode to get a future NodeDeleted event.* -- This message was sent by Atlassian Jira (v8.3.4#803005)