I think tiers/priorities for caching are a very good idea and I'd be
interested to see what others think. In addition to letting libraries cache
RDDs liberally, it could also unify memory management across other parts of
Spark. For example, small shuffles benefit from explicitly keeping the
shuffle outputs in memory rather than writing it to disk, possibly due to
filesystem overhead. To prevent in-memory shuffle outputs from competing
with application RDDs, Spark could mark them as lower-priority and specify
that they should be dropped to disk when memory runs low.

Ankur <http://www.ankurdave.com/>

Reply via email to