A new broadcast object will generated for every iteration step, it may eat up the memory and make persist fail.
The broadcast object should not be removed because RDD may be recomputed. And I am trying to prevent recomputing RDD, it need old broadcast release some memory. I've tried to set "spark.cleaner.ttl", but my task runs into Error(broadcast object not found), I think task is recomputed. I don't think this is a good idea, it makes my code depends on my environment much more. So I changed the persistLevel of my RDD to MEMORY_AND_DISK, but it runs into ERROR(broadcast object not found) too. And I remove the setting of spark.cleaner.ttl, finally. I think support for cache should be more friendly, and broadcast object should be cached, too. flush the old object to disk is much more essential than the new one. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cache-issue-for-iteration-with-broadcast-tp5350.html Sent from the Apache Spark User List mailing list archive at Nabble.com.