AFAIK spark has no public APIs to clean up those RDDs. On Wed, Jan 25, 2017 at 11:30 PM, Andrew Milkowski <amgm2...@gmail.com> wrote:
> Hi Takeshi thanks for the answer, looks like spark would free up old RDD's > however using admin UI we see ie > > Block ID, it corresponds with each receiver and a timestamp. > For example, block input-0-1485275695898 is from receiver 0 and it was > created at 1485275695898 (1/24/2017, 11:34:55 AM GMT-5:00). > That corresponds with the start time. > > that block even after running whole day is still not being released! RDD's > in our scenario are Strings coming from kinesis stream > > is there a way to explicitly purge RDD after last step in M/R process once > and for all ? > > thanks much! > > On Fri, Jan 20, 2017 at 2:35 AM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> AFAIK, the blocks of minibatch RDDs are checked every job finished, and >> older blocks automatically removed (See: https://github.com/apach >> e/spark/blob/master/streaming/src/main/scala/org/apache/ >> spark/streaming/dstream/DStream.scala#L463). >> >> You can control this behaviour by StreamingContext#remember to some >> extent. >> >> // maropu >> >> >> On Fri, Jan 20, 2017 at 3:17 AM, Andrew Milkowski <amgm2...@gmail.com> >> wrote: >> >>> hello >>> >>> using spark 2.0.2 and while running sample streaming app with kinesis >>> noticed (in admin ui Storage tab) "Stream Blocks" for each worker keeps >>> climbing up >>> >>> then also (on same ui page) in Blocks section I see blocks such as below >>> >>> input-0-1484753367056 >>> >>> that are marked as Memory Serialized >>> >>> that do not seem to be "released" >>> >>> above eventually consumes executor memories leading to out of memory >>> exception on some >>> >>> is there a way to "release" these blocks free them up , app is sample >>> m/r >>> >>> I attempted rdd.unpersist(false) in the code but that did not lead to >>> memory free up >>> >>> thanks much in advance! >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > -- --- Takeshi Yamamuro