Re: HW imbalance

2015-01-30 Thread Sandy Ryza
Yup, if you turn off YARN's CPU scheduling then you can run executors to take advantage of the extra memory on the larger boxes. But then some of the nodes will end up severely oversubscribed from a CPU perspective, so I would definitely recommend against that. On Fri, Jan 30, 2015 at 3:31 AM, M

Re: HW imbalance

2015-01-30 Thread Michael Segel
Sorry, but I think there’s a disconnect. When you launch a job under YARN on any of the hadoop clusters, the number of mappers/reducers is not set and is dependent on the amount of available resources. So under Ambari, CM, or MapR’s Admin, you should be able to specify the amount of resources

Re: HW imbalance

2015-01-29 Thread Sandy Ryza
My answer was based off the specs that Antony mentioned: different amounts of memory, but 10 cores on all the boxes. In that case, a single Spark application's homogeneously sized executors won't be able to take advantage of the extra memory on the bigger boxes. Cloudera Manager can certainly con

Re: HW imbalance

2015-01-29 Thread Michael Segel
@Sandy, There are two issues. The spark context (executor) and then the cluster under YARN. If you have a box where each yarn job needs 3GB, and your machine has 36GB dedicated as a YARN resource, you can run 12 executors on the single node. If you have a box that has 72GB dedicated to YARN

Re: HW imbalance

2015-01-28 Thread simon elliston ball
You shouldn’t have any issues with differing nodes on the latest Ambari and Hortonworks. It works fine for mixed hardware and spark on yarn. Simon > On Jan 26, 2015, at 4:34 PM, Michael Segel wrote: > > If you’re running YARN, then you should be able to mix and max where YARN is > managing t

Re: HW imbalance

2015-01-26 Thread Sandy Ryza
Hi Antony, Unfortunately, all executors for any single Spark application must have the same amount of memory. It's possibly to configure YARN with different amounts of memory for each host (using yarn.nodemanager.resource.memory-mb), so other apps might be able to take advantage of the extra memo

Re: HW imbalance

2015-01-26 Thread Michael Segel
If you’re running YARN, then you should be able to mix and max where YARN is managing the resources available on the node. Having said that… it depends on which version of Hadoop/YARN. If you’re running Hortonworks and Ambari, then setting up multiple profiles may not be straight forward. (I

Re: HW imbalance

2015-01-26 Thread Antony Mayi
should have said I am running as yarn-client. all I can see is specifying the generic executor memory that is then to be used in all containers. On Monday, 26 January 2015, 16:48, Charles Feduke wrote: You should look at using Mesos. This should abstract away the individual hosts

Re: HW imbalance

2015-01-26 Thread Charles Feduke
You should look at using Mesos. This should abstract away the individual hosts into a pool of resources and make the different physical specifications manageable. I haven't tried configuring Spark Standalone mode to have different specs on different machines but based on spark-env.sh.template: #