Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Cool, thanks for the dev insight into what parts of the codebase are worthwhile, and which are not =) On Tue, Nov 3, 2015 at 10:25 PM Reynold Xin wrote: > It is quite a bit of work. Again, I think going through the file system > API is more ideal in the long run. In the long run, I don't even th

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
It is quite a bit of work. Again, I think going through the file system API is more ideal in the long run. In the long run, I don't even think the current offheap API makes much sense, and we should consider just removing it to simplify things. On Tue, Nov 3, 2015 at 1:20 PM, Justin Uang wrote:

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Alright, we'll just stick with normal caching then. Just for future reference, how much work would it be to get it to retain the partitions in tachyon. This is especially helpful in a multitenant situation, where many users each have their own persistent spark contexts, but where the notebooks can

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
It is lost unfortunately (although can be recomputed automatically). On Tue, Nov 3, 2015 at 1:13 PM, Justin Uang wrote: > Thanks for your response. I was worried about #3, vs being able to use the > objects directly. #2 seems to be the dealbreaker for my use case right? > Even if it I am using

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Thanks for your response. I was worried about #3, vs being able to use the objects directly. #2 seems to be the dealbreaker for my use case right? Even if it I am using tachyon for caching, if an executor is lost, then that partition is lost for the purposes of spark? On Tue, Nov 3, 2015 at 5:53 P

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Reynold Xin
I don't think there is any special handling w.r.t. Tachyon vs in-heap caching. As a matter of fact, I think the current offheap caching implementation is pretty bad, because: 1. There is no namespace sharing in offheap mode 2. Similar to 1, you cannot recover the offheap memory once Spark driver o

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Justin Uang
Yup, but I'm wondering what happens when an executor does get removed, but when we're using tachyon. Will the cached data still be available, since we're using off-heap storage, so the data isn't stored in the executor? On Tue, Nov 3, 2015 at 4:57 PM Ryan Williams wrote: > fwiw, I think that hav

Re: Off-heap storage and dynamic allocation

2015-11-03 Thread Ryan Williams
fwiw, I think that having cached RDD partitions prevents executors from being removed under dynamic allocation by default; see SPARK-8958 . The "spark.dynamicAllocation.cachedExecutorIdleTimeout" config

Off-heap storage and dynamic allocation

2015-10-30 Thread Justin Uang
Hey guys, According to the docs for 1.5.1, when an executor is removed for dynamic allocation, the cached data is gone. If I use off-heap storage like tachyon, conceptually there isn't this issue anymore, but is the cached data still available in practice? This would be great because then we would