Lijie Xu created SPARK-17325:
--------------------------------
Summary: Inconsistent Spillable threshold and AppendOnlyMap
growing threshold may trigger out-of-memory errors
Key: SPARK-17325
URL: https://issues.apache.org/jira/browse/SPARK-17325
Project: Spark
Issue Type: Bug
Components: Shuffle, Spark Core
Affects Versions: 2.0.0, 1.6.2
Reporter: Lijie Xu
I am reading the shuffle source code and guessing that there may be a potential
out-of-memory error in ExternalSorter.
The problem is that the memory usage of AppendOnlyMap (i.e.,
PartitionedAppendOnlyMap in ExternalSorter) can greatly exceed its spillable
threshold (i.e., `currentMemory` can be 2 times the size of `myMemoryThreshold`
in `Spillable.maybeSpill()`). This means that the task's current execution
memory usage (AppendOnlyMap) has greatly exceeded its defined execution memory
limit ((1 - spark.memory.storageFraction) * 1 / #taskNum), which will lead to
potential out-of-memory errors.
Example: Current spillable threshold has become 250MB, while the AppendOnlyMap
is 200MB. At this time, an incoming key/value record triggers AppendOnlyMap's
size expansion (AppendOnlyMap is full). After expansion, the AppendOnlyMap may
become 400MB (or slightly smaller), which is greatly larger than the spillable
threshold and execution memory limit.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]