Hi All,
It appears that the bottleneck in my job was the EBS volumes. Very high i/o
wait times across the cluster. I was only using 1 volume. Increasing to 4
made it faster.
Thanks,
Pradeep
On Thu, Apr 20, 2017 at 3:12 PM, Pradeep Gollakota
wrote:
> Hi All,
>
> I have a simple ETL job that rea
Hi All,
I have a simple ETL job that reads some data, shuffles it and writes it
back out. This is running on AWS EMR 5.4.0 using Spark 2.1.0.
After Stage 0 completes and the job starts Stage 1, I see a huge slowdown
in the job. The CPU usage is low on the cluster, as is the network I/O.
>From the