Hi I am trying to cache 2Gbyte data and to implement the following procedure. In order to cache them I did as follows: Is it necessary to cache rdd2 since rdd1 is already cached?
rdd1 = textFile("hdfs...").cache() rdd2 = rdd1.filter(userDefinedFunc1).cache() rdd3 = rdd1.filter(userDefinedFunc2).cache() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Proper-caching-method-tp4206.html Sent from the Apache Spark User List mailing list archive at Nabble.com.