[ 
https://issues.apache.org/jira/browse/HIVE-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chun Chen updated HIVE-5575:
----------------------------

    Description: 
See the attachment, I have encountered a scenario that hive retries to  unlock 
all locks, but zookeeper session is closed. If there are hundreds of locks, say 
dynamic partition, the process will hang up for several days.

The stack is 
{code}
Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):

"Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on 
condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on 
condition [0x0000000042af7000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
        at java.lang.Thread.run(Thread.java:722)

   Locked ownable synchronizers:
        - None

"Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting 
on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable 
[0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

   Locked ownable synchronizers:
        - None

"Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() 
[0x00000000427f4000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
        - locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
        at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)

   Locked ownable synchronizers:
        - None

"Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in 
Object.wait() [0x0000000041900000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:503)
        at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
        - locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)

   Locked ownable synchronizers:
        - None

"main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition 
[0x0000000040f46000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
        at 
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
        at 
org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
        at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
        at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

   Locked ownable synchronizers:
        - None

"VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 

"GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e 
runnable 

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f 
runnable 

"GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 
runnable 

"GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 
runnable 

"GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 
runnable 

"GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 
runnable 

"GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 
runnable 

"GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 
runnable 

"GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 
runnable 

"GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 
runnable 

"GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 
runnable 

"GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 
runnable 

"GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a 
runnable 

"VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting on 
condition 

JNI global references: 294
{code}

  was:See the attachment, I have encountered a scenario that hive retries to  
unlock all locks, but zookeeper session is closed. If there are hundreds of 
locks, say dynamic partition, the process will hang up for several days.


> ZooKeeper connection closed when unlock with retry
> --------------------------------------------------
>
>                 Key: HIVE-5575
>                 URL: https://issues.apache.org/jira/browse/HIVE-5575
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.11.0
>            Reporter: Chun Chen
>            Assignee: Chun Chen
>             Fix For: 0.13.0
>
>         Attachments: D13515.1.patch, zookeeper session closed.png
>
>
> See the attachment, I have encountered a scenario that hive retries to  
> unlock all locks, but zookeeper session is closed. If there are hundreds of 
> locks, say dynamic partition, the process will hang up for several days.
> The stack is 
> {code}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (23.21-b01 mixed mode):
> "Attach Listener" daemon prio=10 tid=0x000000000683f000 nid=0x34d0 waiting on 
> condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "LeaseChecker" daemon prio=10 tid=0x0000000006693800 nid=0x2713 waiting on 
> condition [0x0000000042af7000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
>       at java.lang.Thread.run(Thread.java:722)
>    Locked ownable synchronizers:
>       - None
> "Service Thread" daemon prio=10 tid=0x00002aaab8001000 nid=0x2651 runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "C2 CompilerThread1" daemon prio=10 tid=0x0000000005c7c800 nid=0x2650 waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "C2 CompilerThread0" daemon prio=10 tid=0x0000000005c71000 nid=0x264f waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "Signal Dispatcher" daemon prio=10 tid=0x0000000005c6f000 nid=0x264e runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
>    Locked ownable synchronizers:
>       - None
> "Finalizer" daemon prio=10 tid=0x0000000005c22000 nid=0x264d in Object.wait() 
> [0x00000000427f4000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>       - locked <0x000000078324b110> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
>    Locked ownable synchronizers:
>       - None
> "Reference Handler" daemon prio=10 tid=0x0000000005c1a000 nid=0x264c in 
> Object.wait() [0x0000000041900000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       at java.lang.Object.wait(Object.java:503)
>       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
>       - locked <0x000000078328fbc0> (a java.lang.ref.Reference$Lock)
>    Locked ownable synchronizers:
>       - None
> "main" prio=10 tid=0x0000000005b76800 nid=0x263d waiting on condition 
> [0x0000000040f46000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>       at java.lang.Thread.sleep(Native Method)
>       at 
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlockWithRetry(ZooKeeperHiveLockManager.java:426)
>       at 
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.unlock(ZooKeeperHiveLockManager.java:415)
>       at 
> org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager.releaseLocks(ZooKeeperHiveLockManager.java:257)
>       at org.apache.hadoop.hive.ql.Driver.releaseLocks(Driver.java:864)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:953)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
>       at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
>       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:712)
>       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:601)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>    Locked ownable synchronizers:
>       - None
> "VM Thread" prio=10 tid=0x0000000005c12800 nid=0x264b runnable 
> "GC task thread#0 (ParallelGC)" prio=10 tid=0x0000000005b84800 nid=0x263e 
> runnable 
> "GC task thread#1 (ParallelGC)" prio=10 tid=0x0000000005b86000 nid=0x263f 
> runnable 
> "GC task thread#2 (ParallelGC)" prio=10 tid=0x0000000005b88000 nid=0x2640 
> runnable 
> "GC task thread#3 (ParallelGC)" prio=10 tid=0x0000000005b8a000 nid=0x2641 
> runnable 
> "GC task thread#4 (ParallelGC)" prio=10 tid=0x0000000005b8b800 nid=0x2642 
> runnable 
> "GC task thread#5 (ParallelGC)" prio=10 tid=0x0000000005b8d800 nid=0x2643 
> runnable 
> "GC task thread#6 (ParallelGC)" prio=10 tid=0x0000000005b8f800 nid=0x2644 
> runnable 
> "GC task thread#7 (ParallelGC)" prio=10 tid=0x0000000005b91000 nid=0x2645 
> runnable 
> "GC task thread#8 (ParallelGC)" prio=10 tid=0x0000000005b93000 nid=0x2646 
> runnable 
> "GC task thread#9 (ParallelGC)" prio=10 tid=0x0000000005b95000 nid=0x2647 
> runnable 
> "GC task thread#10 (ParallelGC)" prio=10 tid=0x0000000005b96800 nid=0x2648 
> runnable 
> "GC task thread#11 (ParallelGC)" prio=10 tid=0x0000000005b98800 nid=0x2649 
> runnable 
> "GC task thread#12 (ParallelGC)" prio=10 tid=0x0000000005b9a800 nid=0x264a 
> runnable 
> "VM Periodic Task Thread" prio=10 tid=0x00002aaab800c000 nid=0x2652 waiting 
> on condition 
> JNI global references: 294
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to