Ah, yes, I missed that part
it's `spark.local.dir`
spark.local.dir /tmp Directory to use for "scratch" space in Spark,
including map output files and RDDs that get stored on disk. This should be
on a fast, local disk in your system. It can also be a comma-separated list
of multiple directories on
I do think that there is an option to set the temporary shuffle location to
a particular directory. While working with EMR I set it to /mnt1/. Let me
know in case you are not able to find it.
On Mon, Dec 18, 2017 at 8:10 PM, Mihai Iacob wrote:
> This code generates files under /tmp...blockmgr...
at 10:08 AM, Mihai Iacob wrote:
> When does spark remove them?
>
>
> Regards,
>
> *Mihai Iacob*
> DSX Local <https://datascience.ibm.com/local> - Security, IBM Analytics
>
>
>
> - Original message -
> From: Vadim Semenov
> To: Mihai Iacob
>
When does spark remove them?
Regards,
Mihai IacobDSX Local - Security, IBM Analytics
- Original
Spark doesn't remove intermediate shuffle files if they're part of the same
job.
On Mon, Dec 18, 2017 at 3:10 PM, Mihai Iacob wrote:
> This code generates files under /tmp...blockmgr... which do not get
> cleaned up after the job finishes.
>
> Anything wrong with the code below? or are there any
This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes.
Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files?
window = Window.\
partitionBy('***', 'date_str').\
orderBy(