peter-toth commented on PR #50757: URL: https://github.com/apache/spark/pull/50757#issuecomment-2848539891
I see many new comments on this PR (some even seem to be unrelated), but my understanding after https://github.com/apache/spark/pull/50757#issuecomment-2844972082 is that we wil do runtime shuffle checksum validation to determine if a stage behaves non-deterministically, so this PR (and https://github.com/apache/spark/pull/50029) which aims to mark a stage non-deterministic based on related DF query plans/expressions is not needed. Actually what we could probably do is to mark a stage deterministic based on the query plan to avoid shuffle checksum computation on surely deterministic stages to save same costs (https://github.com/apache/spark/pull/50757#issuecomment-2845349959). This is because we know deterministic behaviour of stages assembled from a DF query, but we don't know it when stages are created via RDD API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org