Re: tiers of caching

Andrew Or Mon, 07 Jul 2014 10:42:21 -0700

Others have also asked for this on the mailing list, and hence there's a
related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur
brings up a good point in that any current implementation of in-memory
shuffles will compete with application RDD blocks. I think we should
definitely add this at some point. In terms of a timeline, we already have
many features lined up for 1.1, however, so it will likely be after that.



2014-07-07 10:13 GMT-07:00 Ankur Dave <ankurd...@gmail.com>:

> I think tiers/priorities for caching are a very good idea and I'd be
> interested to see what others think. In addition to letting libraries cache
> RDDs liberally, it could also unify memory management across other parts of
> Spark. For example, small shuffles benefit from explicitly keeping the
> shuffle outputs in memory rather than writing it to disk, possibly due to
> filesystem overhead. To prevent in-memory shuffle outputs from competing
> with application RDDs, Spark could mark them as lower-priority and specify
> that they should be dropped to disk when memory runs low.
>
> Ankur <http://www.ankurdave.com/>
>
>

Re: tiers of caching

Reply via email to