Hi I am trying to cache 2Gbyte data and to implement the following procedure. In order to cache them I did as follows: Is it necessary to cache rdd2 since rdd1 is already cached?
rdd1 = textFile("hdfs...").cache()
rdd2 = rdd1.filter(userDefinedFunc1).cache()
rdd3 = rdd1.filter(userDefinedFunc2).cache()
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Proper-caching-method-tp4206.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
