Actually, TD's work-in-progress is probably more what you want: https://github.com/apache/spark/pull/126
On Tue, Mar 11, 2014 at 1:58 PM, Michael Allman <m...@allman.ms> wrote: > Hello, > > I've been trying to run an iterative spark job that spills 1+ GB to disk > per iteration on a system with limited disk space. I believe there's enough > space if spark would clean up unused data from previous iterations, but as > it stands the number of iterations I can run is limited by available disk > space. > > I found a thread on the usage of spark.cleaner.ttl on the old Spark Users > Google group here: > > https://groups.google.com/forum/#!topic/spark-users/9ebKcNCDih4 > > I think this setting may be what I'm looking for, however the cleaner > seems to delete data that's still in use. The effect is I get bizarre > exceptions from Spark complaining about missing broadcast data or > ArrayIndexOutOfBounds. When is spark.cleaner.ttl safe to use? Is it > supposed to delete in-use data or is this a bug/shortcoming? > > Cheers, > > Michael > > >