You cannot assume that caching would always reduce the execution time, especially if the data-set is large. It appears that if too much memory is used for caching, then less memory is left for the actual computation itself. There has to be a balance between the two.
Page 33 of this thesis from KTH talks about this: http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf Best ----- Gaurav Jain Master's Student, D-INFK ETH Zurich -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/rdd-cache-is-not-faster-tp7804p7835.html Sent from the Apache Spark User List mailing list archive at Nabble.com.