You cannot assume that caching would always reduce the execution time,
especially if the data-set is large. It appears that if too much memory is
used for caching, then less memory is left for the actual computation
itself. There has to be a balance between the two. 

Page 33 of this thesis from KTH talks about this:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf

Best



-----
Gaurav Jain
Master's Student, D-INFK
ETH Zurich
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/rdd-cache-is-not-faster-tp7804p7835.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to