Sergey Shelukhin created HIVE-10648: ---------------------------------------
Summary: LLAP: registry; Tez attempted to schedule to daemon that didn't exist Key: HIVE-10648 URL: https://issues.apache.org/jira/browse/HIVE-10648 Project: Hive Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Gopal V I can post logs externally; for now app IDs on test cluster are application_1429683757595_0784 and application_1429683757595_0783, I also have logs copied over. AM found the node (same logs for other nodes): {noformat} 2015-05-07 12:13:28,074 INFO [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] impl.LlapYarnRegistryImpl: Adding new worker 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with resources=<memory:20480, vCores:6>] .... 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: Num cluster nodes = 19 {noformat} Trouble is, this node never actually existed... The cluster only had 15 nodes. As the job was progressing, AM repeatedly tried to schedule to this node and failed. There was no other LLAP cluster running at the same time. In fact, given that I always start a 15-node cluster I am not sure where 19-node data could conceivably come from... -- This message was sent by Atlassian JIRA (v6.3.4#6332)