[ https://issues.apache.org/jira/browse/SPARK-24622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519416#comment-16519416 ]
Thomas Graves commented on SPARK-24622: --------------------------------------- Need to investigate further/test to make sure I am not missing anything > Task attempts in other stage attempts not killed when one task attempt > succeeds > ------------------------------------------------------------------------------- > > Key: SPARK-24622 > URL: https://issues.apache.org/jira/browse/SPARK-24622 > Project: Spark > Issue Type: Bug > Components: Scheduler > Affects Versions: 2.1.0 > Reporter: Thomas Graves > Priority: Major > > Looking through the code handling for > [https://github.com/apache/spark/pull/21577,] I was looking to see how we are > killing task attempts. I don't any where that we actually kill task attempts > for stage attempts not in the one that completed successfully. > > For instance: > stage 0.0 . (stage id 0, attempt 0) > - task 1.0 (task 1, attempt 0) > Stage 0.1 (stage id 0, attempt 1) started due to fetch failure for instance > - task 1.0 (task 1, attempt 0) . Equivalent task for stage 0.0, task 1.0 > because task 1.0 in stage 0.0 didn't finish and didn't fail. > > Now if task 1.0 in stage 0.0 succeeds, it gets committed and marked as > successful. We will mark the task in stage 0.1 as completed but there is no > where in the code that I see it actually kill task 1.0 in stage 0.1. > Note that the scheduler does handle the case where we have 2 attempts > (speculation) in a single stage attempt. It will kill the other attempt when > one of them succeeds. See TaskSetManager.handleSuccessfulTask -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org