[ https://issues.apache.org/jira/browse/HIVE-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533486#comment-14533486 ]
Sergey Shelukhin commented on HIVE-10648: ----------------------------------------- Hmm, that's true actually; and cluster had 16 nodes. It appears that one node didn't exist or was picked up wrong > LLAP: registry; Tez attempted to schedule to daemon that didn't exist > --------------------------------------------------------------------- > > Key: HIVE-10648 > URL: https://issues.apache.org/jira/browse/HIVE-10648 > Project: Hive > Issue Type: Sub-task > Reporter: Sergey Shelukhin > Assignee: Gopal V > > I can post logs externally; for now app IDs on test cluster are > application_1429683757595_0784 and application_1429683757595_0783, I also > have logs copied over. > AM found the node (same logs for other nodes): > {noformat} > 2015-05-07 12:13:28,074 INFO > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerEventHandler] > impl.LlapYarnRegistryImpl: Adding new worker > 342f4992-2608-43ab-a119-b50882e35f75 which mapped to DynamicServiceInstance > [alive=true, host=cn059-10.l42scl.hortonworks.com:15001 with > resources=<memory:20480, vCores:6>] > .... > 2015-05-07 12:13:28,082 INFO [Dispatcher thread: Central] node.AMNodeTracker: > Num cluster nodes = 19 > {noformat} > Trouble is, this node never actually existed... The cluster only had 15 > nodes. > As the job was progressing, AM repeatedly tried to schedule to this node and > failed. There was no other LLAP cluster running at the same time. > In fact, given that I always start a 15-node cluster I am not sure where > 19-node data could conceivably come from... -- This message was sent by Atlassian JIRA (v6.3.4#6332)