attilapiros commented on code in PR #50757:
URL: https://github.com/apache/spark/pull/50757#discussion_r2069316385


##########
core/src/main/scala/org/apache/spark/rdd/RDD.scala:
##########


Review Comment:
   @ahshahid Regarding point 3 I was open for your change and asked you to 
extend your integration test with some extra logs to prove it is testing a 
specific case where I have seen problems. But that got stuck there: 
https://github.com/attilapiros/spark/pull/8#issuecomment-2797246777
   
   Regarding point 5. I see value a lot of value in the `inDeterministic` flag. 
As currently we cannot distinguish whether a  shuffle map stage  
indeterministic because of its parent or on its own. Let's say 
`ShuffleMapStageX` is indeterministic because of its operation and along the 
way to a result stage there is another `ShuffleMapStageY` which is only 
indeterministic as it is descendent of `ShuffleMapStageX` but the result stage 
is fetching from `ShuffleMapStageY` when the fetch failure happens we are still 
have a deterministic output so even if the result stage is half ready we can 
continue our work without reverting its output. (In addition  latter it would 
make sense to extend the RDD API to let a user set it when they are using 
indeterminate operation in the map/flatMap body.) 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to