IIUC, if the references of RDDs have gone, the related files (e.g.,
shuffled data) of these
RDDs are automatically removed by `ContextCleaner` (
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L178
).
Since spark can recompute from datasources (this is a fundamental concept
of RDDs), it seems removing these files directly results in failed jobs.
Though, I think removing them by yourself is a smarter way.

I'm not exactly sure about your query in the streaming though, I think your
query might
cause this situation you described.


On Fri, Jan 27, 2017 at 1:48 PM, <kanth...@gmail.com> wrote:

> Hi!
>
> Yes these files are for shuffle blocks however they need to be cleaned as
> well right? I had been running a streaming application for 2 days. The
> third day my disk fills up with all .index and .data files and my
> assumption is that these files had been there since the start of my
> streaming application I should have checked the time stamp before doing rm
> -rf. Please let me know if I am wrong
>
> Sent from my iPhone
>
> On Jan 26, 2017, at 4:24 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
> Yea, I think so and they are the intermediate files for shuffling.
> Probably, kant checked the configuration here (
> http://spark.apache.org/docs/latest/spark-standalone.html) though, this
> is not related to the issue.
>
> // maropu
>
> On Fri, Jan 27, 2017 at 7:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> The files are for shuffle blocks. Where did you find the docs about them?
>>
>> Jacek
>>
>> On 25 Jan 2017 8:41 p.m., "kant kodali" <kanth...@gmail.com> wrote:
>>
>> oh sorry its actually in the documentation. I should just
>> set spark.worker.cleanup.enabled = true
>>
>> On Wed, Jan 25, 2017 at 11:30 AM, kant kodali <kanth...@gmail.com> wrote:
>>
>>> I have bunch of .index and .data files like that fills up my disk. I am
>>> not sure what the fix is? I am running spark 2.0.2 in stand alone mode
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>
>


-- 
---
Takeshi Yamamuro

Reply via email to