[ https://issues.apache.org/jira/browse/HIVE-14979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591664#comment-15591664 ]
Peter Vary commented on HIVE-14979: ----------------------------------- I totally agree with you [~sershe]! Here is what I know at the moment, thanks for the guys who helped out with extra info: - There is 1 configuration value for ZooKeeper timeout (HIVE_ZOOKEEPER_SESSION_TIMEOUT) used by the service discovery and the locks as well. This is set to 20 minutes by default, and might be overwritten by the ZooKeeper maxSessionTimeout value to a lower value. - If the HiveServer2 is shut down with normal methods, then it removes the ZooKeeper nodes as expected (at least I have yet to find an example to contradict this) - If the HiveServer2 dies unexpectedly then ZooKeeper correctly removes the ephemeral nodes, but only after the session timeout is reached - with default configuration it could be 20 minutes - The patch proposes a configuration option which - if enabled - at HiveServer2 startup time will remove the remaining ZooKeeper lock nodes even if the ZooKeeper session timeout is not reached. - So far I read a quiet good reason behind the large timeout (see: the comment by [~thejas], and http://stackoverflow.com/questions/14275613/concerns-about-zookeepers-lock-recipe). Session timeout is reliant on ping messages so a long GC or network congestion could cause session termination. ZooKeeper tries to ping an idle connection after 1/3 of the timeout, so the longer the timeout, the less probable to have a session terminated overzealously :). I do not know enough about the external jobs yet, but I also think the remaining jobs could be a problem. All-in-all solving them with increased timeout does not strike me like a good solution: queries in Hive could be huge and could run for hours/days, so a 20 minutes timeout still not solves the problem at all. Am I right here, or missing some important points? Thanks, Peter > Removing stale Zookeeper locks at HiveServer2 initialization > ------------------------------------------------------------ > > Key: HIVE-14979 > URL: https://issues.apache.org/jira/browse/HIVE-14979 > Project: Hive > Issue Type: Improvement > Components: Locking > Reporter: Peter Vary > Assignee: Peter Vary > Attachments: HIVE-14979.3.patch, HIVE-14979.4.patch, HIVE-14979.patch > > > HiveServer2 could use Zookeeper to store token that indicate that particular > tables are locked with the creation of persistent Zookeeper objects. > A problem can occur when a HiveServer2 instance creates a lock on a table and > the HiveServer2 instances crashes ("Out of Memory" for example) and the locks > are not released in Zookeeper. This lock will then remain until it is > manually cleared by an admin. > There should be a way to remove stale locks at HiveServer2 initialization, > helping the admins life. -- This message was sent by Atlassian JIRA (v6.3.4#6332)