[jira] [Commented] (HIVE-14608) LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill

Sergey Shelukhin (JIRA) Fri, 02 Sep 2016 14:15:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459610#comment-15459610
 ]


Sergey Shelukhin commented on HIVE-14608:
-----------------------------------------

[~sseth] I can actually see problems because of this. Easy repro - start LLAP 
(e.g. 7 nodes), start the session (with AM), flex LLAP down (e.g. to 4), run 
some query. There can be a large delay in scheduling and the whole job can slow 
down a lot because nodes are not removed from instanceToNodeMap...
{noformat}
2016-09-02 16:51:41,428 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|tezplugins.LlapTaskSchedulerService|: Setting up node: DynamicServiceInstance 
[alive=true, host=cn109... with resources=<memory:83968, vCores:16>, 
shufflePort=15551, servicesAddress=..., mgmtPort=15004] with available 
capacity=16, pendingQueueSize=null, memory=83968
...
(of course nothing is actually removed)
2016-09-02 16:52:01,490 [INFO] [StateChangeNotificationHandler] 
|tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node 
with identity: f9b37b46-f629-4460-862f-f34183ba0a24
2016-09-02 16:52:01,567 [INFO] [StateChangeNotificationHandler] 
|tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node 
with identity: 12399334-c743-4a9b-8224-8c0cbc21dea7
2016-09-02 16:52:01,776 [INFO] [StateChangeNotificationHandler] 
|tezplugins.LlapTaskSchedulerService$NodeStateChangeListener|: Removed node 
with identity: c7b50156-b4f9-4353-89a4-3d1a1ccea604
...
2016-09-02 16:53:39,511 [INFO] [LlapScheduler] 
|tezplugins.LlapTaskSchedulerService|: Assigned task 
TaskInfo{task=attempt_1466700718395_1343_2_07_000000_1, priority=140, 
startTime=0, containerId=null, assignedInstance=null, uniqueId=24, 
localityDelayTimeout=0} to container container_222212222_1343_01_000025 on 
node=DynamicServiceInstance [alive=true, host=cn109... with 
resources=<memory:83968, vCores:16>, shufflePort=15551, servicesAddress=..., 
mgmtPort=15004]
{noformat}

> LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill 
> --------------------------------------------------------------------------
>
>                 Key: HIVE-14608
>                 URL: https://issues.apache.org/jira/browse/HIVE-14608
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> ...and presumably doesn't disable them for scheduling. I haven't looked in 
> detail though, I just see some harmless killed tasks in queries after I kill 
> some LLAP nodes manually between queries
> {noformat}
>   public void workerNodeRemoved(ServiceInstance serviceInstance) {
>      // FIXME: disabling this for now
> // instanceToNodeMap.remove(serviceInstance.getWorkerIdentity());
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14608) LLAP: slow scheduling due to LlapTaskScheduler not removing nodes on kill

Reply via email to