attilapiros commented on code in PR #50757: URL: https://github.com/apache/spark/pull/50757#discussion_r2069316385
########## core/src/main/scala/org/apache/spark/rdd/RDD.scala: ########## Review Comment: @ahshahid Regarding point 3 I was open for your change and asked you to extend your integration test with some extra logs to prove it is testing a specific case where I have seen problems. But that got stuck there: https://github.com/attilapiros/spark/pull/8#issuecomment-2797246777 Regarding point 5. I see value a lot of value in the `inDeterministic` flag. As currently we cannot distinguish whether a shuffle map stage indeterministic because of its parent or on its own. Let's say `ShuffleMapStageX` is indeterministic because of its operation and along the way to a result stage there is another `ShuffleMapStageY` which is only indeterministic as it is descendent of `ShuffleMapStageX` but the result stage is fetching from `ShuffleMapStageY` when the fetch failure happens we are still have a deterministic output so even if the result stage is half ready we can continue our work without reverting its output. (In addition latter it would make sense to extend the RDD API to let a user set it when they are using indeterminate operation in the map/flatMap body.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org