You are mixing up storage and execution memory. Following is the sequence of storage retention/eviction.
- Execution and storage share a unified region (M). - When no spark execution is underway, storage activity can take up the whole of M. This is vice versa for execution activity. - When both spark execution and storage is underway, there is a priority sequence in terms of claiming regions of M that comes into play. - When spark execution starts, if a portion of M is already occupied by storage but is now needed for execution, execution starts evicting storage to reclaim the space. - But this eviction can't happen to reclaim the whole of M for execution. There is a certain reserved threshold R (subset of M) till which this eviction of storage by execution can take place. If the execution tries to evict more than R, it is stopped. - In short, R is that subregion of M where storage will always have more priority than execution and will never be released to execution. Regards, Subhasis Mukherjee ________________________________ From: Sreyan Chakravarty <sreya...@gmail.com> Sent: Wednesday, August 14, 2024 9:00:45 PM To: user@spark.apache.org <user@spark.apache.org> Subject: Need help understanding tuning docs https://spark.apache.org/docs/latest/tuning.html#memory-management-overview What is the meaning of : "Execution may evict storage if necessary, but only until total storage memory usage falls under a certain threshold (R). In other words, R describes a subregion within M where cached blocks are never evicted. " This seems contradictory, in simple terms I find the meaning that once total memory usage crosses a threshold(R) Spark will start evicting storage in a LRU fashion. But the line: "In other words, R describes a subregion within M where cached blocks are never evicted." Seems contradictory, what is going on? -- Regards, Sreyan Chakravarty