ahshahid commented on code in PR #50033:
URL: https://github.com/apache/spark/pull/50033#discussion_r2025405657


##########
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala:
##########
@@ -3185,16 +3202,164 @@ class DAGSchedulerSuite extends SparkFunSuite with 
TempLocalSparkContext with Ti
       "Spark can only do this while using the new shuffle block fetching 
protocol"))
   }
 
+
+
+  test("SPARK-51272: retry all the partitions of result stage, if the first 
result task" +
+    " has failed and failing ShuffleMap stage is inDeterminate") {
+    this.dagSchedulerInterceptor = createDagInterceptorForSpark51272(
+      () => taskSets(2).tasks(1), "RELEASE_LATCH")
+
+    val numPartitions = 2
+    // The first shuffle stage is completed by the below function itself which 
creates two
+    // indeterminate stages.
+    val (shuffleId1, shuffleId2) = constructTwoStages(
+      stage1InDeterminate = false, stage2InDeterminate = true)
+    completeShuffleMapStageSuccessfully(shuffleId2, 0, numPartitions)
+    val resultStage = scheduler.stageIdToStage(2).asInstanceOf[ResultStage]
+    val activeJob = resultStage.activeJob
+    assert(activeJob.isDefined)
+    // The result stage is still waiting for its 2 tasks to complete
+    assert(resultStage.findMissingPartitions() == 
Seq.tabulate(numPartitions)(i => i))
+
+    // The below event is going to initiate the retry of previous 
indeterminate stages, and also
+    // the retry of all result tasks. But before the "ResubmitFailedStages" 
event is added to the
+    // queue of Scheduler, a successful completion of the result partition 
task is added to the
+    // event queue.  Due to scenario, the bug surfaces where instead of retry 
of all partitions
+    // of result tasks (2 tasks in total), only some (1 task) get retried
+    runEvent(
+      makeCompletionEvent(
+        taskSets(2).tasks(0),
+        FetchFailed(makeBlockManagerId("hostA"), shuffleId1, 0L, 0, 0, 
"ignored"),

Review Comment:
   I see.. Thanks for pointing that. Instead of direct dependencies of result 
stage , the test constructs indirect dependencies.. Let me rectify the test.. 
Though it appears from existing source code, the dependencies are just 
flattened out.. that is say Result. - dependent on Shuffle1  --- dependent on 
Shuffle 2
    and.  Result. - dependent on Shuffle1  
                                                    Shuffle 2
   
   both result in same list of dependecies.. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to