Hi Neil
Yes! it helps!!! I do not see _temporary in console output anymore.
saveAsTextFile
is fast now.
2015-09-02 23:07:00,022 INFO [task-result-getter-0]
scheduler.TaskSetManager (Logging.scala:logInfo(59)) - Finished task 18.0
in stage 0.0 (TID 18) in 4398 ms on ip-10-0-24-103.ec2.internal (
Hi,
Can you set the following parameters in your mapred-site.xml file please:
mapred.output.direct.EmrFileSystemtrue
mapred.output.direct.NativeS3FileSystemtrue
You can also config this at cluster launch time with the following
Classification via EMR console:
classification=mapred-site,properti
I checked previous emr config (emr-3.8)
mapred-site.xml has the following setting
mapred.output.committer.classorg.apache.hadoop.mapred.DirectFileOutputCommitter
On Tue, Sep 1, 2015 at 7:33 PM, Alexander Pivovarov
wrote:
> Should I use DirectOutputCommitter?
> spark.hadoop.mapred.output.commi
Should I use DirectOutputCommitter?
spark.hadoop.mapred.output.committer.class
com.appsflyer.spark.DirectOutputCommitter
On Tue, Sep 1, 2015 at 4:01 PM, Alexander Pivovarov
wrote:
> I run spark 1.4.1 in amazom aws emr 4.0.0
>
> For some reason spark saveAsTextFile is very slow on emr 4.0.0 in