Yup, if you turn off YARN's CPU scheduling then you can run executors to
take advantage of the extra memory on the larger boxes. But then some of
the nodes will end up severely oversubscribed from a CPU perspective, so I
would definitely recommend against that.
On Fri, Jan 30, 2015 at 3:31 AM, M
Sorry, but I think there’s a disconnect.
When you launch a job under YARN on any of the hadoop clusters, the number of
mappers/reducers is not set and is dependent on the amount of available
resources.
So under Ambari, CM, or MapR’s Admin, you should be able to specify the amount
of resources
My answer was based off the specs that Antony mentioned: different amounts
of memory, but 10 cores on all the boxes. In that case, a single Spark
application's homogeneously sized executors won't be able to take advantage
of the extra memory on the bigger boxes.
Cloudera Manager can certainly con
@Sandy,
There are two issues.
The spark context (executor) and then the cluster under YARN.
If you have a box where each yarn job needs 3GB, and your machine has 36GB
dedicated as a YARN resource, you can run 12 executors on the single node.
If you have a box that has 72GB dedicated to YARN
You shouldn’t have any issues with differing nodes on the latest Ambari and
Hortonworks. It works fine for mixed hardware and spark on yarn.
Simon
> On Jan 26, 2015, at 4:34 PM, Michael Segel wrote:
>
> If you’re running YARN, then you should be able to mix and max where YARN is
> managing t
Hi Antony,
Unfortunately, all executors for any single Spark application must have the
same amount of memory. It's possibly to configure YARN with different
amounts of memory for each host (using
yarn.nodemanager.resource.memory-mb), so other apps might be able to take
advantage of the extra memo
If you’re running YARN, then you should be able to mix and max where YARN is
managing the resources available on the node.
Having said that… it depends on which version of Hadoop/YARN.
If you’re running Hortonworks and Ambari, then setting up multiple profiles may
not be straight forward. (I
should have said I am running as yarn-client. all I can see is specifying the
generic executor memory that is then to be used in all containers.
On Monday, 26 January 2015, 16:48, Charles Feduke
wrote:
You should look at using Mesos. This should abstract away the individual hosts
You should look at using Mesos. This should abstract away the individual
hosts into a pool of resources and make the different physical
specifications manageable.
I haven't tried configuring Spark Standalone mode to have different specs
on different machines but based on spark-env.sh.template:
#