It is lost unfortunately (although can be recomputed automatically).
On Tue, Nov 3, 2015 at 1:13 PM, Justin Uang <justin.u...@gmail.com> wrote: > Thanks for your response. I was worried about #3, vs being able to use the > objects directly. #2 seems to be the dealbreaker for my use case right? > Even if it I am using tachyon for caching, if an executor is lost, then > that partition is lost for the purposes of spark? > > On Tue, Nov 3, 2015 at 5:53 PM Reynold Xin <r...@databricks.com> wrote: > >> I don't think there is any special handling w.r.t. Tachyon vs in-heap >> caching. As a matter of fact, I think the current offheap caching >> implementation is pretty bad, because: >> >> 1. There is no namespace sharing in offheap mode >> 2. Similar to 1, you cannot recover the offheap memory once Spark driver >> or executor crashes >> 3. It requires expensive serialization to go offheap >> >> It would've been simpler to just treat Tachyon as a normal file system, >> and use it that way to at least satisfy 1 and 2, and also substantially >> simplify the internals. >> >> >> >> >> On Tue, Nov 3, 2015 at 7:59 AM, Justin Uang <justin.u...@gmail.com> >> wrote: >> >>> Yup, but I'm wondering what happens when an executor does get removed, >>> but when we're using tachyon. Will the cached data still be available, >>> since we're using off-heap storage, so the data isn't stored in the >>> executor? >>> >>> On Tue, Nov 3, 2015 at 4:57 PM Ryan Williams < >>> ryan.blake.willi...@gmail.com> wrote: >>> >>>> fwiw, I think that having cached RDD partitions prevents executors from >>>> being removed under dynamic allocation by default; see SPARK-8958 >>>> <https://issues.apache.org/jira/browse/SPARK-8958>. The >>>> "spark.dynamicAllocation.cachedExecutorIdleTimeout" config >>>> <http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation> >>>> controls this. >>>> >>>> On Fri, Oct 30, 2015 at 12:14 PM Justin Uang <justin.u...@gmail.com> >>>> wrote: >>>> >>>>> Hey guys, >>>>> >>>>> According to the docs for 1.5.1, when an executor is removed for >>>>> dynamic allocation, the cached data is gone. If I use off-heap storage >>>>> like >>>>> tachyon, conceptually there isn't this issue anymore, but is the cached >>>>> data still available in practice? This would be great because then we >>>>> would >>>>> be able to set spark.dynamicAllocation.cachedExecutorIdleTimeout to be >>>>> quite small. >>>>> >>>>> ================== >>>>> In addition to writing shuffle files, executors also cache data either >>>>> on disk or in memory. When an executor is removed, however, all cached >>>>> data >>>>> will no longer be accessible. There is currently not yet a solution for >>>>> this in Spark 1.2. In future releases, the cached data may be preserved >>>>> through an off-heap storage similar in spirit to how shuffle files are >>>>> preserved through the external shuffle service. >>>>> ================== >>>>> >>>> >>