hi, Andrew Ash, thanks for your reply.
In fact, I have already used unpersist(), but it doesn't take effect.
One reason that I select 1.0.0 version is just for it providing unpersist()
interface.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-
Hi Randy,
In Spark 1.0 there was a lot of work done to allow unpersisting data that's
no longer needed. See the below pull request.
Try running kvGlobal.unpersist() on line 11 before the re-broadcast of the
next variable to see if you can cut the dependency there.
https://github.com/apache/spar
But when i put broadcast variable out of for-circle, it workes well(if not
concerned about memory issue as you pointed out):
1 var rdd1 = ...
2 var rdd2 = ...
3 var kv = ...
4 var kvGlobal = sc.broadcast(kv) // broadcast kv
5 for (i <- 0 until n) {
6rdd1 = rdd2.ma
rdd1 is cached, but it has no effect:
1 var rdd1 = ...
2 var rdd2 = ...
3 var kv = ...
4 for (i <- 0 until n) {
5var kvGlobal = sc.broadcast(kv) // broadcast kv
6rdd1 = rdd2.map {
7 case t => doSomething(t, kvGlobal.value)
8}.cache()
9var tmp
RDD is not cached?
Because recomputing may be required, every broadcast object is included in
the dependences of RDDs, this may also have memory issue(when n and kv is
too large in your case).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-b
i run in spark 1.0.0, the newest under-development version.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-broadcast-variable-in-iteration-tp5479p5480.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.