peter-toth commented on PR #50757:
URL: https://github.com/apache/spark/pull/50757#issuecomment-2848539891

   I see many new comments on this PR (some even seem to be unrelated), but my 
understanding after 
https://github.com/apache/spark/pull/50757#issuecomment-2844972082 is that we 
wil do runtime shuffle checksum validation to determine if a stage behaves 
non-deterministically, so this PR (and 
https://github.com/apache/spark/pull/50029) which aims to mark a stage 
non-deterministic based on related DF query plans/expressions is not needed.
   
   Actually what we could probably do is to mark a stage deterministic based on 
the query plan to avoid shuffle checksum computation on surely deterministic 
stages to save same costs 
(https://github.com/apache/spark/pull/50757#issuecomment-2845349959). This is 
because we know deterministic behaviour of stages assembled from a DF query, 
but we don't know it when stages are created via RDD API.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to