How about increasing RDD's partitions / rebalancing data?
On Sat, Mar 11, 2017 at 2:33 PM, Parsian, Mahmoud
wrote:
> How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
> This is taking over 30 minutes on a cluster of 10 nodes.
> Running Spark on YARN.
>
> JavaRDD has 120 million e
How to improve performance of JavaRDD.saveAsTextFile(“hdfs://…“).
This is taking over 30 minutes on a cluster of 10 nodes.
Running Spark on YARN.
JavaRDD has 120 million entries.
Thank you,
Best regards,
Mahmoud