subject:"saveAsTextFile and tmp files generations in tasks"

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Gil Vernik

/IBM@IBMIL Cc: dev Date: 15/04/2015 06:20 PM Subject:Re: saveAsTextFile and tmp files generations in tasks The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used in SparkHadoopWriter (which in turn is used by

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Imran Rashid

The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used in SparkHadoopWriter (which in turn is used by PairRDDFunctions.saveAsHadoopDataset). You could change the output committer to not use tmp files (eg. use this from Aaron Da

saveAsTextFile and tmp files generations in tasks

2015-04-14 Thread Gil Vernik

Hi, I run very simple operation via ./spark-shell (version 1.3.0 ): val data = Array(1, 2, 3, 4) val distd = sc.parallelize(data) distd.saveAsTextFile(.. ) When i executed it, I saw that 4 tasks very created in Spark. Each task created 2 temp files at different stages, there was 1st tmp file (