Re: Launching an Spark application in a subset of machines

Jörn Franke Tue, 07 Feb 2017 03:27:27 -0800

If you want to run them always on the same machines use yarn node labels. If it 
is any 10 machines then use capacity or fair scheduler.


What is the use case for running it always on the same 10 machines. If it is 
for licensing reasons then I would ask your vendor if this is a suitable mean 
to ensure license compliance. Otherwise dedicated cluster.

> On 7 Feb 2017, at 12:09, Alvaro Brandon <alvarobran...@gmail.com> wrote:
> 
> Hello Pavel:
> 
> Thanks for the pointers. 
> 
> For standalone cluster manager: I understand that I just have to start 
> several masters with a subset of slaves attached. Then each master will 
> listen on a different pair of <hostname,port>, allowing me to spark-submit to 
> any of these pairs depending on the subset of machines I want to use.
> 
> For Mesos: I haven't used Mesos much. Any references or documentation I can 
> use to set this up?
> 
> Best Regards
> 
> 
> 
> 2017-02-07 11:36 GMT+01:00 Pavel Plotnikov <pavel.plotni...@team.wrike.com>:
>> Hi, Alvaro
>> You can create different clusters using standalone cluster manager, and than 
>> manage subset of machines through submitting application on different 
>> masters. Or you can use Mesos attributes to mark subset of workers and 
>> specify it in spark.mesos.constraints
>> 
>> 
>>> On Tue, Feb 7, 2017 at 1:21 PM Alvaro Brandon <alvarobran...@gmail.com> 
>>> wrote:
>>> Hello all:
>>> 
>>> I have the following scenario. 
>>> - I have a cluster of 50 machines with Hadoop and Spark installed on them. 
>>> - I want to launch one Spark application through spark submit. However I 
>>> want this application to run on only a subset of these machines, 
>>> disregarding data locality. (e.g. 10 machines)
>>> 
>>> Is this possible?. Is there any option in the standalone scheduler, YARN or 
>>> Mesos that allows such thing?.
>>> 
>>> 
>

Re: Launching an Spark application in a subset of machines

Reply via email to