A new broadcast object will generated for every iteration step, it may eat up
the memory and make persist fail. 

The broadcast object should not be removed because RDD may be recomputed.
And I am trying to prevent recomputing RDD, it need old broadcast release
some memory.

I've tried to set "spark.cleaner.ttl", but my task runs into Error(broadcast
object not found), I think task is recomputed. I don't think this is a good
idea, it makes my code depends on my environment much more.

So I changed the persistLevel of my RDD to MEMORY_AND_DISK, but it runs into
ERROR(broadcast object not found) too.  And I remove the setting of
spark.cleaner.ttl, finally.

I think support for cache should be more friendly, and broadcast object
should be cached, too. flush the old object to disk is much more essential
than the new one.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to