attilapiros commented on PR #50033: URL: https://github.com/apache/spark/pull/50033#issuecomment-2779981829
The problem is not the race condition see https://github.com/attilapiros/spark/commit/b555ab59d14bfd64db781baa45b03c9510e7d30d where the tests are copied from your repo (see the empty diff output after the `curl`): ``` ➜ spark_II git:(SPARK-51272_attila_2) curl https://raw.githubusercontent.com/apache/spark/00a4aadb8cfce30f2234453c64b9ca46c60fa07f/core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala > core/src/te st/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 230k 100 230k 0 0 655k 0 --:--:-- --:--:-- --:--:-- 657k ➜ spark_II git:(SPARK-51272_attila_2) git diff ➜ spark_II git:(SPARK-51272_attila_2) ➜ spark_II git:(SPARK-51272_attila_2) ./build/sbt "core/testOnly *DAGSchedulerSuite -- -z SPARK-51272" ... info] DAGSchedulerSuite: OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended [info] - SPARK-51272: retry all the partitions of result stage, if the first result task has failed and failing ShuffleMap stage is inDeterminate (1 second, 511 milliseconds) [info] - SPARK-51272: retry all the partitions of result stage, if the first result task has failed with failing ShuffleStage determinate but result stage has another ShuffleStage which is indeterminate (299 milliseconds) [info] - SPARK-51272: retry all the partitions of Shuffle stage, if any task of ShuffleStage has failed and failing ShuffleMap stage is inDeterminate (256 milliseconds) [info] Run completed in 7 seconds, 731 milliseconds. [info] Total number of tests run: 3 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 3, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. [success] Total time: 29 s, completed Apr 4, 2025, 5:31:51 PM ➜ spark_II git:(SPARK-51272_attila_2) ``` **But our solution is not enough as the `ResultStage` does not support revert/rollback so re-executing all the tasks may lead to incorrect result!** There is another issue created for this, please check out https://issues.apache.org/jira/browse/SPARK-25342. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
