[ https://issues.apache.org/jira/browse/HIVE-16094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth updated HIVE-16094: ---------------------------------- Attachment: HIVE-16094.01.patch The problem was that if an am was picked up by the queueDrainer when it had 0 fragments, it would not be put back. registerFragment would only add a new entry to the queue if the am was not known. AMNodeInfo instances were originally meant to be used across multiple queries belonging to an AM. We could still achieve that by going back to the old model of reference counting. However, I think it's cleaner to maintain an AMNodeInfo instance per query instance. So - the patch changes the key to be the queryIdentifier. An instance of amNodeInfo is always maintained in the queue. A heartbeat is only sent if there are pending fragments. It is removed from the queue after query completion, or if an error is hit. cc [~prasanth_j] for review. > queued containers may timeout if they don't get to run for a long time > ---------------------------------------------------------------------- > > Key: HIVE-16094 > URL: https://issues.apache.org/jira/browse/HIVE-16094 > Project: Hive > Issue Type: Bug > Affects Versions: 2.2.0 > Reporter: Siddharth Seth > Assignee: Siddharth Seth > Priority: Critical > Attachments: HIVE-16094.01.patch > > > I believe this happened after HIVE-15958 - since we end up keeping amNodeInfo > in knownAppMaters, and that can result in the callable not being scheduled on > new task registration. -- This message was sent by Atlassian JIRA (v6.3.15#6346)