[ 
https://issues.apache.org/jira/browse/SPARK-57376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hongze Zhang updated SPARK-57376:
---------------------------------
    Description: 
The issue:

When shuffle partition number is configured large, Spark could use significant 
large amount of JVM heap memory just for storing the broadcasted shuffle 
status, even with off-heap mode enabled. For fixing this, we can do some minor 
enhancements around the shuffle status tracker and the base torrent broadcast 
API. 

 

The fix:

The fix can be split to 2 steps. Step 1 is to optimize against the broadcast 
blocks to enable off-heap support, and step 2 is to optimize against the 
broadcast values for the same purpose. Local sanity tests, and memory footprint 
benchmark can be run for the final verification.

  was:
When shuffle partition number is configured large, Spark could use significant 
large amount of JVM heap memory just for storing the broadcasted shuffle 
status, even with off-heap mode enabled. For fixing this, we can do some minor 
enhancements around the shuffle status tracker and the base torrent broadcast 
API. 

 

 The fix can be split to 2 steps. Step 1 is to optimize against the broadcast 
blocks to enable off-heap support, and step 2 is to optimize against the 
broadcast values for the same purpose. Local sanity tests, and memory footprint 
benchmark can be run for the final verification.


> Add off-heap broadcasting support for shuffle status
> ----------------------------------------------------
>
>                 Key: SPARK-57376
>                 URL: https://issues.apache.org/jira/browse/SPARK-57376
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core
>    Affects Versions: 4.1.2
>            Reporter: Hongze Zhang
>            Priority: Major
>
> The issue:
> When shuffle partition number is configured large, Spark could use 
> significant large amount of JVM heap memory just for storing the broadcasted 
> shuffle status, even with off-heap mode enabled. For fixing this, we can do 
> some minor enhancements around the shuffle status tracker and the base 
> torrent broadcast API. 
>  
> The fix:
> The fix can be split to 2 steps. Step 1 is to optimize against the broadcast 
> blocks to enable off-heap support, and step 2 is to optimize against the 
> broadcast values for the same purpose. Local sanity tests, and memory 
> footprint benchmark can be run for the final verification.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to