Maybe your memory isn't enough to contain the current RDD and also all the past ones? RDDs that are cached or persisted have to be unpersisted explicitly, no auto-unpersist (maybe changes will be for 1.0 version?) exists. Be careful that calling cache() or persist() doesn't imply the RDD will be materialised...... I personally found this pattern of usage as simpler one:
> val mwzNew = mwz.mapPartitions(...).cache.persist > mwzNew.count() or mwzNew.foreach(x => {}) // Force evaluation of the new > RDD in order to have it materialized > mwz.unpersist() // Drop from memory and disk the old, not anymore used, > RDD > 2014-05-04 5:16 GMT+02:00 Earthson <earthson...@gmail.com>: > I'm using spark for LDA impementation. I need cache RDD for next step of > Gibbs Sampling, and cached the result and the cache previous could be > uncache. Something like LRU cache should delete the previous cache because > it is never used then, but the cache runs into confusion: > > Here is the code:) > < > https://github.com/Earthson/sparklda/blob/master/src/main/scala/net/earthson/nlp/lda/lda.scala#L99 > > > > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache1.png > > > > < > http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache2.png > > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/cache-not-work-as-expected-for-iteration-tp5292.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >