imarch1 zhang created SPARK-51460: ------------------------------------- Summary: Shuffle read and write are inconsistent when push-based shuffle is enabled Key: SPARK-51460 URL: https://issues.apache.org/jira/browse/SPARK-51460 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.3.0 Reporter: imarch1 zhang
When push-based shuffle enabled, some spark applications in our cluster experienced shuffle data inconsistent. The metrics of Exchange as follows: !image-2025-03-11-09-04-04-265.png! As seen in the picture, reduce tasks read more data than what map tasks write. The only clue we find is that the number of records read by all *successful* reduce tasks is consistent with the number of record written, which is 1,529,614,111. We fail to find out how come additional wrong records (1,529,974,564 - 1,529,614,111) appear in Exchange. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org