[ https://issues.apache.org/jira/browse/HIVE-23443?focusedWorklogId=434227&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-434227 ]
ASF GitHub Bot logged work on HIVE-23443: ----------------------------------------- Author: ASF GitHub Bot Created on: 17/May/20 18:09 Start Date: 17/May/20 18:09 Worklog Time Spent: 10m Work Description: pgaref commented on a change in pull request #1012: URL: https://github.com/apache/hive/pull/1012#discussion_r426288840 ########## File path: llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java ########## @@ -884,10 +885,20 @@ private void finishableStateUpdated(TaskWrapper taskWrapper, boolean newFinishab taskWrapper.updateCanFinishForPriority(newFinishableState); forceReinsertIntoQueue(taskWrapper, isRemoved); } else { - taskWrapper.updateCanFinishForPriority(newFinishableState); - if (!newFinishableState && !taskWrapper.isInPreemptionQueue()) { - // No need to check guaranteed here; if it was false we would already be in the queue. + // if speculative task, any finishable state change should re-order the queue as speculative tasks are always + // not-guaranteed (re-order helps put non-finishable's ahead of finishable) + if (!taskWrapper.isGuaranteed()) { + removeFromPreemptionQueue(taskWrapper); + taskWrapper.updateCanFinishForPriority(newFinishableState); addToPreemptionQueue(taskWrapper); + } else { + // if guaranteed task, if the finishable state changed to non-finishable and if the task doesn't exist + // pre-emption queue, then add it so that it becomes candidate to kill + taskWrapper.updateCanFinishForPriority(newFinishableState); Review comment: @prasanthj thanks for fixing this! Patch looks good and is now committed! At some point I would also change the first comment of the method to clarify that a task that is both Guaranteed and Finishable should never be in the preemption queue. https://github.com/apache/hive/pull/1012/files#diff-16658bf15468ecd089c4fd32e75fa8b2R876 I am believe there is value to put some effort documenting and describing how the scheduler works (started keeping some noted but happy to add more work). I personally find the information in this area of the project very limited. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 434227) Time Spent: 50m (was: 40m) > LLAP speculative task pre-emption seems to be not working > --------------------------------------------------------- > > Key: HIVE-23443 > URL: https://issues.apache.org/jira/browse/HIVE-23443 > Project: Hive > Issue Type: Bug > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23443.1.patch, HIVE-23443.2.patch, > HIVE-23443.3.patch > > Time Spent: 50m > Remaining Estimate: 0h > > I think after HIVE-23210 we are getting a stable sort order and it is causing > pre-emption to not work in certain cases. > {code:java} > "attempt_1589167813851_0000_119_01_000008_0 > (hive_20200511055921_89598f09-19f1-4969-ab7a-82e2dd796273-119/Map 1, started > at 2020-05-11 05:59:22, in preemption queue, can finish)", > "attempt_1589167813851_0008_84_01_000008_1 > (hive_20200511055928_7ae29ca3-e67d-4d1f-b193-05651023b503-84/Map 1, started > at 2020-05-11 06:00:23, in preemption queue, can finish)" {code} > Scheduler only peek's at the pre-emption queue and looks at whether it is > non-finishable. > [https://github.com/apache/hive/blob/master/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskExecutorService.java#L420] > In the above case, all tasks are speculative but state change is not > triggering pre-emption queue re-ordering so peek() always returns canFinish > task even though non-finishable tasks are in the queue. -- This message was sent by Atlassian Jira (v8.3.4#803005)