The “RDD” aka Batch RDD which you load from file, will be kept for as long as 
the Spark Framework is instantiated / running – you can also ensure it is 
flagged explicitly as Persisted e.g. In Memory and/or disk

 

From: Anand Nalya [mailto:[email protected]] 
Sent: Tuesday, July 7, 2015 12:34 PM
To: [email protected]
Subject: 

 

Hi,

 

Suppose I have an RDD that is loaded from some file and then I also have a 
DStream that has data coming from some stream. I want to keep union some of the 
tuples from the DStream into my RDD. For this I can use something like this:

 

  var myRDD: RDD[(String, Long)] = sc.fromText...

  dstream.foreachRDD{ rdd =>

    myRDD = myRDD.union(rdd.filter(myfilter))

  }

 

My questions is that for how long spark will keep RDDs underlying the dstream 
around? Is there some configuratoin knob that can control that?

 

Regards,

Anand

Reply via email to