Re: which aws instance type for shuffle performance

2015-12-18 Thread Alexander Pivovarov
Andrew, it's going to be 4 execotor jvms on each r3.8xlarge. Rastan, you can run quick test using emr spark cluster on spot instances and see what configuration works better. Without the tests it is all speculation. On Dec 18, 2015 1:53 PM, "Andrew Or" wrote: > Hi Rastan, > > Unless you're using

Re: which aws instance type for shuffle performance

2015-12-18 Thread Andrew Or
Hi Rastan, Unless you're using off-heap memory or starting multiple executors per machine, I would recommend the r3.2xlarge option, since you don't actually want gigantic heaps (100GB is more than enough). I've personally run Spark on a very large scale with r3.8xlarge instances, but I've been usi

which aws instance type for shuffle performance

2015-12-15 Thread Rastan Boroujerdi
I'm trying to determine whether I should be using 10 r3.8xlarge or 40 r3.2xlarge. I'm mostly concerned with shuffle performance of the application. If I go with r3.8xlarge I will need to configure 4 worker instances per machine to keep the JVM size down. The worker instances will likely contend wi