[jira] [Updated] (HIVE-16094) queued containers may timeout if they don't get to run for a long time

Siddharth Seth (JIRA) Thu, 02 Mar 2017 15:42:13 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Siddharth Seth updated HIVE-16094:
----------------------------------
    Attachment: HIVE-16094.01.patch

The problem was that if an am was picked up by the queueDrainer when it had 0 
fragments, it would not be put back. registerFragment would only add a new 
entry to the queue if the am was not known.

AMNodeInfo instances were originally meant to be used across multiple queries 
belonging to an AM. We could still achieve that by going back to the old model 
of reference counting.

However, I think it's cleaner to maintain an AMNodeInfo instance per query 
instance. So - the patch changes the key to be the queryIdentifier. An instance 
of amNodeInfo is always maintained in the queue. A heartbeat is only sent if 
there are pending fragments. It is removed from the queue after query 
completion, or if an error is hit.

cc [~prasanth_j] for review.

> queued containers may timeout if they don't get to run for a long time
> ----------------------------------------------------------------------
>
>                 Key: HIVE-16094
>                 URL: https://issues.apache.org/jira/browse/HIVE-16094
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: HIVE-16094.01.patch
>
>
> I believe this happened after HIVE-15958 - since we end up keeping amNodeInfo 
> in knownAppMaters, and that can result in the callable not being scheduled on 
> new task registration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16094) queued containers may timeout if they don't get to run for a long time

Reply via email to