[ https://issues.apache.org/jira/browse/HIVE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799287#comment-13799287 ]
Chun Chen commented on HIVE-5575: --------------------------------- Thanks for reviewing the code [~brocknoland]. > ZooKeeper connection closed when unlock with retry > -------------------------------------------------- > > Key: HIVE-5575 > URL: https://issues.apache.org/jira/browse/HIVE-5575 > Project: Hive > Issue Type: Bug > Affects Versions: 0.11.0 > Reporter: Chun Chen > Assignee: Chun Chen > Fix For: 0.13.0 > > Attachments: D13515.1.patch, HIVE-5575.patch, zookeeper session > closed.png > > > See the attachment, I have encountered a scenario that hive retries to > unlock all locks, but zookeeper session is closed. If there are hundreds of > locks, say dynamic partition, the process will hang up for several days. > The stack is > {code} > Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode): > "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on > condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > Locked ownable synchronizers: > - None > "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on > condition [0x0000000042af7000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376) > at java.lang.Thread.run(Thread.java:722) > Locked ownable synchronizers: > - None > "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > Locked ownable synchronizers: > - None > "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting > on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > Locked ownable synchronizers: > - None > "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting > on condition [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > Locked ownable synchronizers: > - None > "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable > [0x0000000000000000] > java.lang.Thread.State: RUNNABLE > Locked ownable synchronizers: > - None > "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() > [0x00000000427f4000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135) > - locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189) > Locked ownable synchronizers: > - None > "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in > Object.wait() [0x0000000041900000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) > - locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock) > Locked ownable synchronizers: > - None > "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition > [0x0000000040f46000] > java.lang.Thread.State: TIMED_WAITING (sleeping) > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426) > at > org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415) > at > org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257) > at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) > at > org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > Locked ownable synchronizers: > - None > "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable > "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e > runnable > "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f > runnable > "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 > runnable > "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 > runnable > "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 > runnable > "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 > runnable > "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 > runnable > "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 > runnable > "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 > runnable > "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 > runnable > "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 > runnable > "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 > runnable > "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a > runnable > "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting > on condition > JNI global references: 294 > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)