fwiw, I think that having cached RDD partitions prevents executors from being removed under dynamic allocation by default; see SPARK-8958 <https://issues.apache.org/jira/browse/SPARK-8958>. The "spark.dynamicAllocation.cachedExecutorIdleTimeout" config <http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation> controls this.
On Fri, Oct 30, 2015 at 12:14 PM Justin Uang <justin.u...@gmail.com> wrote: > Hey guys, > > According to the docs for 1.5.1, when an executor is removed for dynamic > allocation, the cached data is gone. If I use off-heap storage like > tachyon, conceptually there isn't this issue anymore, but is the cached > data still available in practice? This would be great because then we would > be able to set spark.dynamicAllocation.cachedExecutorIdleTimeout to be > quite small. > > ================== > In addition to writing shuffle files, executors also cache data either on > disk or in memory. When an executor is removed, however, all cached data > will no longer be accessible. There is currently not yet a solution for > this in Spark 1.2. In future releases, the cached data may be preserved > through an off-heap storage similar in spirit to how shuffle files are > preserved through the external shuffle service. > ================== >