[ 
https://issues.apache.org/jira/browse/HIVE-10842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10842:
------------------------------------
    Description: 
Looks exactly like HIVE-10744. Last comment there has internal app IDs. Logs 
upon request.
6 (number of slots) tasks from a machine are stuck.
jstack for target daemon sayeth:
{noformat}
   7 Found one Java-level deadlock:
  8 =============================
  9 
 10 "IPC Server handler 4 on 15001":
 11   waiting to lock Monitor@0x00007f3cb0005cb8 (Object@0x000000008cc3ce98, a 
java/lang/Object),
 12   which is held by "Wait-Queue-Scheduler-0"
 13 "Wait-Queue-Scheduler-0":
 14   waiting to lock Monitor@0x00007f3cb0004d98 (Object@0x000000009234cf58, a 
org/apache/hadoop/hive/llap/daemon/impl/Q     ueryInfo$FinishableStateTracker),
 15   which is held by "IPC Server handler 4 on 15001"
{noformat}

Oh, this time it is not q1; I was running bunch of TPCDS queries in sequence 
for some cache test. No parallel queries. There may have been task failures 
before.
The query that got stuck had lots and lots of reducers
{noformat}
Map 1: 1/1    Map 10: 1/1    Map 11: 85/85    Map 13: 1/1    Map 14: 1/1    Map 
15: 1/1    Map 16: 1/1    Map 17: 94/94    Map 19: 1/1    Map 2: 1/1    Map 20: 
1/1    Map 3: 91/91    Map 7: 1/1    Map 8: 1/1    Map 9: 1/1    Reducer 12: 
391/391    Reducer 18: 197/197    Reducer 4: 1009/1009    Reducer 5: 
1003(+6)/1009    Reducer 6: 0(+1)/1
{noformat}
I think it's query 58

  was:
Looks exactly like HIVE-10744. Last comment there has internal app IDs. Logs 
upon request.
6 (number of slots) tasks from a machine are stuck.
jstack for target daemon sayeth:
{noformat}
   7 Found one Java-level deadlock:
  8 =============================
  9 
 10 "IPC Server handler 4 on 15001":
 11   waiting to lock Monitor@0x00007f3cb0005cb8 (Object@0x000000008cc3ce98, a 
java/lang/Object),
 12   which is held by "Wait-Queue-Scheduler-0"
 13 "Wait-Queue-Scheduler-0":
 14   waiting to lock Monitor@0x00007f3cb0004d98 (Object@0x000000009234cf58, a 
org/apache/hadoop/hive/llap/daemon/impl/Q     ueryInfo$FinishableStateTracker),
 15   which is held by "IPC Server handler 4 on 15001"
{noformat}


> LLAP: DAGs get stuck in yet another way
> ---------------------------------------
>
>                 Key: HIVE-10842
>                 URL: https://issues.apache.org/jira/browse/HIVE-10842
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> Looks exactly like HIVE-10744. Last comment there has internal app IDs. Logs 
> upon request.
> 6 (number of slots) tasks from a machine are stuck.
> jstack for target daemon sayeth:
> {noformat}
>    7 Found one Java-level deadlock:
>   8 =============================
>   9 
>  10 "IPC Server handler 4 on 15001":
>  11   waiting to lock Monitor@0x00007f3cb0005cb8 (Object@0x000000008cc3ce98, 
> a java/lang/Object),
>  12   which is held by "Wait-Queue-Scheduler-0"
>  13 "Wait-Queue-Scheduler-0":
>  14   waiting to lock Monitor@0x00007f3cb0004d98 (Object@0x000000009234cf58, 
> a org/apache/hadoop/hive/llap/daemon/impl/Q     
> ueryInfo$FinishableStateTracker),
>  15   which is held by "IPC Server handler 4 on 15001"
> {noformat}
> Oh, this time it is not q1; I was running bunch of TPCDS queries in sequence 
> for some cache test. No parallel queries. There may have been task failures 
> before.
> The query that got stuck had lots and lots of reducers
> {noformat}
> Map 1: 1/1    Map 10: 1/1    Map 11: 85/85    Map 13: 1/1    Map 14: 1/1    
> Map 15: 1/1    Map 16: 1/1    Map 17: 94/94    Map 19: 1/1    Map 2: 1/1    
> Map 20: 1/1    Map 3: 91/91    Map 7: 1/1    Map 8: 1/1    Map 9: 1/1    
> Reducer 12: 391/391    Reducer 18: 197/197    Reducer 4: 1009/1009    Reducer 
> 5: 1003(+6)/1009    Reducer 6: 0(+1)/1
> {noformat}
> I think it's query 58



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to