Hi,
Suppose I have an RDD that is loaded from some file and then I also have a
DStream that has data coming from some stream. I want to keep union some of
the tuples from the DStream into my RDD. For this I can use something like
this:
var myRDD: RDD[(String, Long)] = sc.fromText...
dstream.foreachRDD{ rdd =>
myRDD = myRDD.union(rdd.filter(myfilter))
}
My questions is that for how long spark will keep RDDs underlying the
dstream around? Is there some configuratoin knob that can control that?
Regards,
Anand