Re: cache not work as expected for iteration?

Andrea Esposito Sun, 04 May 2014 01:49:26 -0700

Maybe your memory isn't enough to contain the current RDD and also all the
past ones?
RDDs that are cached or persisted have to be unpersisted explicitly, no
auto-unpersist (maybe changes will be for 1.0 version?) exists.
Be careful that calling cache() or persist() doesn't imply the RDD will be
materialised......
I personally found this pattern of usage as simpler one:


> val mwzNew = mwz.mapPartitions(...).cache.persist
> mwzNew.count() or mwzNew.foreach(x => {}) // Force evaluation of the new
> RDD in order to have it materialized
> mwz.unpersist() // Drop from memory and disk the old, not  anymore used,
> RDD
>




2014-05-04 5:16 GMT+02:00 Earthson <earthson...@gmail.com>:

> I'm using spark for LDA impementation. I need cache RDD for next step of
> Gibbs Sampling, and cached the result and the cache previous could be
> uncache. Something like LRU cache should delete the previous cache because
> it is never used then, but the cache runs into confusion:
>
> Here is the code:)
> <
> https://github.com/Earthson/sparklda/blob/master/src/main/scala/net/earthson/nlp/lda/lda.scala#L99
> >
>
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache1.png
> >
>
> <
> http://apache-spark-user-list.1001560.n3.nabble.com/file/n5292/sparklda_cache2.png
> >
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/cache-not-work-as-expected-for-iteration-tp5292.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Re: cache not work as expected for iteration?

Reply via email to