/IBM@IBMIL
Cc: dev
Date: 15/04/2015 06:20 PM
Subject:Re: saveAsTextFile and tmp files generations in tasks
The temp file creation is controlled by a hadoop OutputCommitter, which is
normally FileOutputCommitter by default. Its used in SparkHadoopWriter
(which in turn is used by
The temp file creation is controlled by a hadoop OutputCommitter, which is
normally FileOutputCommitter by default. Its used in SparkHadoopWriter
(which in turn is used by PairRDDFunctions.saveAsHadoopDataset).
You could change the output committer to not use tmp files (eg. use this
from Aaron Da
Hi,
I run very simple operation via ./spark-shell (version 1.3.0 ):
val data = Array(1, 2, 3, 4)
val distd = sc.parallelize(data)
distd.saveAsTextFile(.. )
When i executed it, I saw that 4 tasks very created in Spark. Each task
created 2 temp files at different stages, there was 1st tmp file (