[
https://issues.apache.org/jira/browse/SPARK-57376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hongze Zhang updated SPARK-57376:
---------------------------------
Description:
The issue:
When shuffle partition number is configured large, Spark could use significant
large amount of JVM heap memory just for storing the broadcasted shuffle
status, even with off-heap mode enabled. For fixing this, we can do some minor
enhancements around the shuffle status tracker and the base torrent broadcast
API.
The fix:
The fix can be split to 2 steps. Step 1 is to optimize against the broadcast
blocks to enable off-heap support, and step 2 is to optimize against the
broadcast values for the same purpose. Local sanity tests, and memory footprint
benchmark can be run for the final verification.
was:
When shuffle partition number is configured large, Spark could use significant
large amount of JVM heap memory just for storing the broadcasted shuffle
status, even with off-heap mode enabled. For fixing this, we can do some minor
enhancements around the shuffle status tracker and the base torrent broadcast
API.
The fix can be split to 2 steps. Step 1 is to optimize against the broadcast
blocks to enable off-heap support, and step 2 is to optimize against the
broadcast values for the same purpose. Local sanity tests, and memory footprint
benchmark can be run for the final verification.
> Add off-heap broadcasting support for shuffle status
> ----------------------------------------------------
>
> Key: SPARK-57376
> URL: https://issues.apache.org/jira/browse/SPARK-57376
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core
> Affects Versions: 4.1.2
> Reporter: Hongze Zhang
> Priority: Major
>
> The issue:
> When shuffle partition number is configured large, Spark could use
> significant large amount of JVM heap memory just for storing the broadcasted
> shuffle status, even with off-heap mode enabled. For fixing this, we can do
> some minor enhancements around the shuffle status tracker and the base
> torrent broadcast API.
>
> The fix:
> The fix can be split to 2 steps. Step 1 is to optimize against the broadcast
> blocks to enable off-heap support, and step 2 is to optimize against the
> broadcast values for the same purpose. Local sanity tests, and memory
> footprint benchmark can be run for the final verification.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]