Others have also asked for this on the mailing list, and hence there's a related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur brings up a good point in that any current implementation of in-memory shuffles will compete with application RDD blocks. I think we should definitely add this at some point. In terms of a timeline, we already have many features lined up for 1.1, however, so it will likely be after that.
2014-07-07 10:13 GMT-07:00 Ankur Dave <ankurd...@gmail.com>: > I think tiers/priorities for caching are a very good idea and I'd be > interested to see what others think. In addition to letting libraries cache > RDDs liberally, it could also unify memory management across other parts of > Spark. For example, small shuffles benefit from explicitly keeping the > shuffle outputs in memory rather than writing it to disk, possibly due to > filesystem overhead. To prevent in-memory shuffle outputs from competing > with application RDDs, Spark could mark them as lower-priority and specify > that they should be dropped to disk when memory runs low. > > Ankur <http://www.ankurdave.com/> > >