Yes, we are also facing same problem. The workaround we came up with is - store the broadcast variable id when it was first created - then create a scheduled job which runs every (spark.cleaner.ttl - 1minute) interval and creates the same broadcast variable using same id. This way spark is happy finding same broadcast file (broadcast_<id>) val httpBroadcastFactory = new HttpBroadcastFactory() httpBroadcastFactory.newBroadcast(bcastVariable.value, false, id)
On Wed, Mar 12, 2014 at 2:38 AM, Aaron Davidson <ilike...@gmail.com> wrote: > And to answer your original question, spark.cleaner.ttl is not safe for > the exact reason you brought up. The PR Mark linked intends to provide a > much cleaner (and safer) solution. > > > On Tue, Mar 11, 2014 at 2:01 PM, Mark Hamstra <m...@clearstorydata.com>wrote: > >> Actually, TD's work-in-progress is probably more what you want: >> https://github.com/apache/spark/pull/126 >> >> >> On Tue, Mar 11, 2014 at 1:58 PM, Michael Allman <m...@allman.ms> wrote: >> >>> Hello, >>> >>> I've been trying to run an iterative spark job that spills 1+ GB to disk >>> per iteration on a system with limited disk space. I believe there's enough >>> space if spark would clean up unused data from previous iterations, but as >>> it stands the number of iterations I can run is limited by available disk >>> space. >>> >>> I found a thread on the usage of spark.cleaner.ttl on the old Spark >>> Users Google group here: >>> >>> https://groups.google.com/forum/#!topic/spark-users/9ebKcNCDih4 >>> >>> I think this setting may be what I'm looking for, however the cleaner >>> seems to delete data that's still in use. The effect is I get bizarre >>> exceptions from Spark complaining about missing broadcast data or >>> ArrayIndexOutOfBounds. When is spark.cleaner.ttl safe to use? Is it >>> supposed to delete in-use data or is this a bug/shortcoming? >>> >>> Cheers, >>> >>> Michael >>> >>> >>> >> > -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.com o: +91 80 4121 8723 m: +91 988 699 3746 skype: sourav.chandra Livestream "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.com