Yup, but I'm wondering what happens when an executor does get removed, but when we're using tachyon. Will the cached data still be available, since we're using off-heap storage, so the data isn't stored in the executor?
On Tue, Nov 3, 2015 at 4:57 PM Ryan Williams <ryan.blake.willi...@gmail.com> wrote: > fwiw, I think that having cached RDD partitions prevents executors from > being removed under dynamic allocation by default; see SPARK-8958 > <https://issues.apache.org/jira/browse/SPARK-8958>. The > "spark.dynamicAllocation.cachedExecutorIdleTimeout" config > <http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation> > controls this. > > On Fri, Oct 30, 2015 at 12:14 PM Justin Uang <justin.u...@gmail.com> > wrote: > >> Hey guys, >> >> According to the docs for 1.5.1, when an executor is removed for dynamic >> allocation, the cached data is gone. If I use off-heap storage like >> tachyon, conceptually there isn't this issue anymore, but is the cached >> data still available in practice? This would be great because then we would >> be able to set spark.dynamicAllocation.cachedExecutorIdleTimeout to be >> quite small. >> >> ================== >> In addition to writing shuffle files, executors also cache data either on >> disk or in memory. When an executor is removed, however, all cached data >> will no longer be accessible. There is currently not yet a solution for >> this in Spark 1.2. In future releases, the cached data may be preserved >> through an off-heap storage similar in spirit to how shuffle files are >> preserved through the external shuffle service. >> ================== >> >