imarch1 zhang created SPARK-51460:
-------------------------------------

             Summary: Shuffle read and write are inconsistent when push-based 
shuffle is enabled
                 Key: SPARK-51460
                 URL: https://issues.apache.org/jira/browse/SPARK-51460
             Project: Spark
          Issue Type: Bug
          Components: Shuffle
    Affects Versions: 3.3.0
            Reporter: imarch1 zhang


When push-based shuffle enabled, some spark applications in our cluster 
experienced shuffle data inconsistent. The metrics of Exchange as follows:

!image-2025-03-11-09-04-04-265.png!

As seen in the picture, reduce tasks read more data than what map tasks write. 

The only clue we find is that the number of records read by all *successful* 
reduce tasks is consistent with the number of record written, which is 
1,529,614,111. We fail to find out how come additional wrong records 
(1,529,974,564 - 1,529,614,111) appear in Exchange.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to