[ https://issues.apache.org/jira/browse/HIVE-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459610#comment-15459610 ]
Sergey Shelukhin commented on HIVE-14608: ----------------------------------------- [~sseth] I can actually see problems because of this. Easy repro - start LLAP (e.g. 7 nodes), start the session (with AM), flex LLAP down (e.g. to 4), run some query. There can be a large delay in scheduling and the whole job can slow down a lot because nodes are not removed from instanceToNodeMap... {noformat} 2016-09-02 16:51:41,428 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |tezplugins.LlapTaskSchedulerService|: Setting up node: DynamicServiceInstance [alive=true, host=cn109... with resources=<memory:83968, vCores:16>, shufflePort=15551, servicesAddress=..., mgmtPort=15004] with available capacity=16, pendingQueueSize=null, memory=83968 ... (of course nothing is actually removed) 2016-09-02 16:52:01,490 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node with identity: f9b37b46-f629-4460-862f-f34183ba0a24 2016-09-02 16:52:01,567 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node with identity: 12399334-c743-4a9b-8224-8c0cbc21dea7 2016-09-02 16:52:01,776 [INFO] [StateChangeNotificationHandler] |tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node with identity: c7b50156-b4f9-4353-89a4-3d1a1ccea604 ... 2016-09-02 16:53:39,511 [INFO] [LlapScheduler] |tezplugins.LlapTaskSchedulerService|: Assigned task TaskInfo{task=attempt_1466700718395_1343_2_07_000000_1, priority=140, startTime=0, containerId=null, assignedInstance=null, uniqueId=24, localityDelayTimeout=0} to container container_222212222_1343_01_000025 on node=DynamicServiceInstance [alive=true, host=cn109... with resources=<memory:83968, vCores:16>, shufflePort=15551, servicesAddress=..., mgmtPort=15004] {noformat} > LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill > -------------------------------------------------------------------------- > > Key: HIVE-14608 > URL: https://issues.apache.org/jira/browse/HIVE-14608 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Siddharth Seth > > ...and presumably doesn't disable them for scheduling. I haven't looked in > detail though, I just see some harmless killed tasks in queries after I kill > some LLAP nodes manually between queries > {noformat} > public void workerNodeRemoved(ServiceInstance serviceInstance) { > // FIXME: disabling this for now > // instanceToNodeMap.remove(serviceInstance.getWorkerIdentity()); > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)