[ https://issues.apache.org/jira/browse/HIVE-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902341#comment-15902341 ]
Siddharth Seth edited comment on HIVE-16104 at 3/9/17 2:10 AM: --------------------------------------------------------------- Thank you. Still some parts which should not have changed... Anyway. Actual review comments this time - Thread.sleep(PREEMPTION_KILL_GRACE_SLEEP_MS); - Think it is possible to replace this with a timed wait. Similar to the way the main scheduling loop waits for a fragment to complete or get scheduled before it tries to schedule the next one. That way we won't actually be sleeping for the entire 100ms. - Another nice to have bit would be to allow the element to go back into the queue, and fetch the latest from the queue - which could be at a higher priority. Don't think this is critical though. - In TaskRunnerCallable - killTask needs a minor change. shouldRunTask = false needs to be set independent of the (taskRunner) condition that it is under. Having a fragment wait outside the queue increases the possibility of an issue like this being hit. (The nice to have about putting the fragment back in the queue and waiting for the next schedule attempt would limit the possibility of other such issues, if they exist, as well) - Any chance of adding tests? The test class is setup with some controlled scheduling constructs to help with this. Suspect it'll need more work with the timed wait though. - System.nanoTime - replace with the Clock instance already used in the class. (Helps with unit tests to simulate time changes, may not be plugged in yet). - Nit: lastKillTimeNs - Long(null) vs long (-1/special value for unset)? - Not required. {code} // TODO: this can all be replaced by a Thread with a Runnable and a catch block {code} was (Author: sseth): Thank you. Still some parts which should not have changed... Anyway. Actual review comments this time Thread.sleep(PREEMPTION_KILL_GRACE_SLEEP_MS); - Think it is possible to replace this with a timed wait. Similar to the way the main scheduling loop waits for a fragment to complete or get scheduled before it tries to schedule the next one. That way we won't actually be sleeping for the entire 100ms. Another nice to have bit would be to allow the element to go back into the queue, and fetch the latest from the queue - which could be at a higher priority. Don't think this is critical though. In TaskRunnerCallable - killTask needs a minor change. shouldRunTask = false needs to be set independent of the (taskRunner) condition that it is under. Having a fragment wait outside the queue increases the possibility of an issue like this being hit. (The nice to have about putting the fragment back in the queue and waiting for the next schedule attempt would limit the possibility of other such issues, if they exist, as well) Any chance of adding tests? The test class is setup with some controlled scheduling constructs to help with this. Suspect it'll need more work with the timed wait though. System.nanoTime - replace with the Clock instance already used in the class. (Helps with unit tests to simulate time changes, may not be plugged in yet). Nit: lastKillTimeNs - Long(null) vs long (-1/special value for unset)? Not required. {code} // TODO: this can all be replaced by a Thread with a Runnable and a catch block {code} > LLAP: preemption may be too aggressive if the pre-empted task doesn't die > immediately > ------------------------------------------------------------------------------------- > > Key: HIVE-16104 > URL: https://issues.apache.org/jira/browse/HIVE-16104 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Assignee: Sergey Shelukhin > Attachments: HIVE-16104.01.patch, HIVE-16104.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)