Hi I am trying to cache 2Gbyte data and to implement the following procedure.
In order to cache them I did as follows: Is it necessary to cache rdd2 since
rdd1 is already cached?

rdd1 = textFile("hdfs...").cache()

rdd2 = rdd1.filter(userDefinedFunc1).cache()
rdd3 = rdd1.filter(userDefinedFunc2).cache()






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Proper-caching-method-tp4206.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to