Our use case is as follows:
We repartition 6 months worth of data for each client on clientId &
recordcreationdate, so that it can write one file per partition. Our
partition is on client and recordcreationdate.
The job fills up the disk after it process say 30 tenants out of 50. I am
looking
Hi,
I have a long running application and spark seem to fill up the disk with
shuffle files. Eventually the job fails running out of disk space. Is there
a way for me to clean the shuffle files ?
Thanks
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-