IIUC, if the references of RDDs have gone, the related files (e.g., shuffled data) of these RDDs are automatically removed by `ContextCleaner` ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ContextCleaner.scala#L178 ). Since spark can recompute from datasources (this is a fundamental concept of RDDs), it seems removing these files directly results in failed jobs. Though, I think removing them by yourself is a smarter way.
I'm not exactly sure about your query in the streaming though, I think your query might cause this situation you described. On Fri, Jan 27, 2017 at 1:48 PM, <kanth...@gmail.com> wrote: > Hi! > > Yes these files are for shuffle blocks however they need to be cleaned as > well right? I had been running a streaming application for 2 days. The > third day my disk fills up with all .index and .data files and my > assumption is that these files had been there since the start of my > streaming application I should have checked the time stamp before doing rm > -rf. Please let me know if I am wrong > > Sent from my iPhone > > On Jan 26, 2017, at 4:24 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > > Yea, I think so and they are the intermediate files for shuffling. > Probably, kant checked the configuration here ( > http://spark.apache.org/docs/latest/spark-standalone.html) though, this > is not related to the issue. > > // maropu > > On Fri, Jan 27, 2017 at 7:46 AM, Jacek Laskowski <ja...@japila.pl> wrote: > >> Hi, >> >> The files are for shuffle blocks. Where did you find the docs about them? >> >> Jacek >> >> On 25 Jan 2017 8:41 p.m., "kant kodali" <kanth...@gmail.com> wrote: >> >> oh sorry its actually in the documentation. I should just >> set spark.worker.cleanup.enabled = true >> >> On Wed, Jan 25, 2017 at 11:30 AM, kant kodali <kanth...@gmail.com> wrote: >> >>> I have bunch of .index and .data files like that fills up my disk. I am >>> not sure what the fix is? I am running spark 2.0.2 in stand alone mode >>> >>> Thanks! >>> >>> >>> >>> >> >> > > > -- > --- > Takeshi Yamamuro > > -- --- Takeshi Yamamuro