Re: tiers of caching

2014-07-07 Thread Andrew Or
Others have also asked for this on the mailing list, and hence there's a related JIRA: https://issues.apache.org/jira/browse/SPARK-1762. Ankur brings up a good point in that any current implementation of in-memory shuffles will compete with application RDD blocks. I think we should definitely add t

Re: tiers of caching

2014-07-07 Thread Ankur Dave
I think tiers/priorities for caching are a very good idea and I'd be interested to see what others think. In addition to letting libraries cache RDDs liberally, it could also unify memory management across other parts of Spark. For example, small shuffles benefit from explicitly keeping the shuffle

tiers of caching

2014-07-07 Thread Koert Kuipers
make sense to create tiers of caching, to distinguish explicitly cached RDDs by the application from RDDs that are temporary cached by algos, so as to make sure these temporary caches don't push application RDDs out of memory?