And to answer your original question, spark.cleaner.ttl is not safe for the
exact reason you brought up. The PR Mark linked intends to provide a much
cleaner (and safer) solution.


On Tue, Mar 11, 2014 at 2:01 PM, Mark Hamstra <m...@clearstorydata.com>wrote:

> Actually, TD's work-in-progress is probably more what you want:
> https://github.com/apache/spark/pull/126
>
>
> On Tue, Mar 11, 2014 at 1:58 PM, Michael Allman <m...@allman.ms> wrote:
>
>> Hello,
>>
>> I've been trying to run an iterative spark job that spills 1+ GB to disk
>> per iteration on a system with limited disk space. I believe there's enough
>> space if spark would clean up unused data from previous iterations, but as
>> it stands the number of iterations I can run is limited by available disk
>> space.
>>
>> I found a thread on the usage of spark.cleaner.ttl on the old Spark Users
>> Google group here:
>>
>> https://groups.google.com/forum/#!topic/spark-users/9ebKcNCDih4
>>
>> I think this setting may be what I'm looking for, however the cleaner
>> seems to delete data that's still in use. The effect is I get bizarre
>> exceptions from Spark complaining about missing broadcast data or
>> ArrayIndexOutOfBounds. When is spark.cleaner.ttl safe to use? Is it
>> supposed to delete in-use data or is this a bug/shortcoming?
>>
>> Cheers,
>>
>> Michael
>>
>>
>>
>

Reply via email to