I'm trying to determine whether I should be using 10 r3.8xlarge or 40 r3.2xlarge. I'm mostly concerned with shuffle performance of the application.
If I go with r3.8xlarge I will need to configure 4 worker instances per machine to keep the JVM size down. The worker instances will likely contend with each other for network and disk I/O if they are on the same machine. If I go with 40 r3.2xlarge I will be able to allocate a single worker instance per box, allowing each worker instance to have its own dedicated network and disk I/O. Since shuffle performance is heavily impacted by disk and network throughput, it seems like going with 40 r3.2xlarge would be the better configuration between the two. Is my analysis correct? Are there other tradeoffs that I'm not taking into account? Does spark bypass the network transfer and read straight from disk if worker instances are on the same machine? Thanks, Rastan