mridulm commented on code in PR #50230: URL: https://github.com/apache/spark/pull/50230#discussion_r2040114948
########## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ########## @@ -5724,6 +5724,21 @@ object SQLConf { .booleanConf .createWithDefault(true) + val SHUFFLE_ORDER_INDEPENDENT_CHECKSUM_ENABLED = + buildConf("spark.shuffle.orderIndependentChecksum.enabled") + .doc("Whether to calculate order independent checksum for the shuffle data or not. If " + + "enabled, Spark will calculate a checksum that is independent of the input row order for " + + "each mapper and returns the checksums from executors to driver. Different from the above" + + "checksum, the order independent remains the same even if the shuffle row order changes. " + + "While the above checksum is sensitive to shuffle data ordering to detect file " + + "corruption. This checksum is used to detect whether different task attempts of the same " + + "partition produce different output data or not (same set of keyValue pairs). In case " + + "the output data has changed across retries, Spark will need to retry all tasks of the " + + "consumer stages to avoid correctness issues.") + .version("4.1.0") Review Comment: @cloud-fan, it is too late for 4.0 - let us move it to 4.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org