Hi there, I was wondering if some one could explain me how the cache() function works in Spark in these phases:
(1) If I have a huge file, say 1TB, which cannot be entirely stored in Memory. What will happen if I try to create a RDD of this huge file and "cache"? (2) If it works in Spark, it can definitely store part of the data. Which part of the data will be stored in memory, especially, do the new data evict the old data out of memory just like what cache works? (3) What would happen if I try to load one RDD and cache, and then another and cache too, and so on so forth? Will the new RDDs evict the old RDDs cached in memory? Thanks very much. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/What-does-Spark-cache-actually-do-tp5778.html Sent from the Apache Spark User List mailing list archive at Nabble.com.