Hi,

We set the spark.cleaner.ttl to some reasonable time and also
set spark.streaming.unpersist=true.


Those together cleaned up the shuffle files for us.


-Conor

On Tue, Apr 21, 2015 at 8:18 AM, N B <nb.nos...@gmail.com> wrote:

> We already do have a cron job in place to clean just the shuffle files.
> However, what I would really like to know is whether there is a "proper"
> way of telling spark to clean up these files once its done with them?
>
> Thanks
> NB
>
>
> On Mon, Apr 20, 2015 at 10:47 AM, Jeetendra Gangele <gangele...@gmail.com>
> wrote:
>
>> Write a crone job for this like below
>>
>> 12 * * * *  find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {} \+
>> 32 * * * *  find /tmp -type d -cmin +1440 -name "spark-*-*-*" -prune
>> -exec rm -rf {} \+
>> 52 * * * *  find $SPARK_LOCAL_DIR -mindepth 1 -maxdepth 1 -type d -cmin
>> +1440 -name "spark-*-*-*" -prune -exec rm -rf {} \+
>>
>>
>> On 20 April 2015 at 23:12, N B <nb.nos...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I had posed this query as part of a different thread but did not get a
>>> response there. So creating a new thread hoping to catch someone's
>>> attention.
>>>
>>> We are experiencing this issue of shuffle files being left behind and
>>> not being cleaned up by Spark. Since this is a Spark streaming application,
>>> it is expected to stay up indefinitely, so shuffle files not being cleaned
>>> up is a big problem right now. Our max window size is 6 hours, so we have
>>> set up a cron job to clean up shuffle files older than 12 hours otherwise
>>> it will eat up all our disk space.
>>>
>>> Please see the following. It seems the non-cleaning of shuffle files is
>>> being documented in 1.3.1.
>>>
>>> https://github.com/apache/spark/pull/5074/files
>>> https://issues.apache.org/jira/browse/SPARK-5836
>>>
>>>
>>> Also, for some reason, the following JIRAs that were reported as
>>> functional issues were closed as Duplicates of the above Documentation bug.
>>> Does this mean that this issue won't be tackled at all?
>>>
>>> https://issues.apache.org/jira/browse/SPARK-3563
>>> https://issues.apache.org/jira/browse/SPARK-4796
>>> https://issues.apache.org/jira/browse/SPARK-6011
>>>
>>> Any further insight into whether this is being looked into and meanwhile
>>> how to handle shuffle files will be greatly appreciated.
>>>
>>> Thanks
>>> NB
>>>
>>>
>>
>>
>>
>>
>

Reply via email to