[ 
https://issues.apache.org/jira/browse/HIVE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303670#comment-15303670
 ] 

Prasanth Jayachandran commented on HIVE-13858:
----------------------------------------------

Looks like RB went down when I was commenting. Adding the comments here
1) CancellationException seems to be not caught. Is it not expected?
2) I think we can remove the TODO for throwing HiveException and replace it 
with InterruptedException. IIRC throwing InterruptedException will also clear 
the interrupt status flag, so the Thread.interrupted() call is also not 
required. TezProcessor anyways catches Throwable, so it should be safe to throw 
InterruptedException.

> LLAP: A preempted task can end up waiting on completeInitialization if some 
> part of the executing code suppressed the interrupt
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13858
>                 URL: https://issues.apache.org/jira/browse/HIVE-13858
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>              Labels: llap
>         Attachments: HIVE-13858.01.patch, HIVE-13858.02.patch
>
>
> An interrupt along with a HiveProcessor.abort call is made when attempting to 
> preempt a task.
> In this specific case, the task was in the middle of HDFS IO - which 
> 'handled' the interrupt by retrying. As a result the interrupt status on the 
> thread was reset - so instead of skipping the future.get in 
> completeInitialization - the task ended up blocking there.
> End result - a single executor slot permanently blocked in LLAP. Depending on 
> what else is running - this can cause a cluster level deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to