Re: Starting executor without a master

Mathieu Longtin Fri, 20 May 2016 06:01:06 -0700

Correct, what I do to start workers is the equivalent of start-slaves.sh.
It ends up running the same command on the worker servers as start-slaves
does.


It definitively uses all workers, and workers starting later pick up work
as well. If you have a long running job, you can add workers dynamically
and they will pick up work as long as there are enough partitions to go
around.

I set spark.locality.wait to 0 so that workers never wait to pick up tasks.



On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> OK this is basically form my notes for Spark standalone. Worker process is
> the slave process
>
> [image: Inline images 2]
>
>
>
> You start worker as you showed
>
> $SPARK_HOME/sbin/start-slaves.sh
> Now that picks up the worker host node names from $SPARK_HOME/conf/slaves
> files. So you still have to tell Spark where to run workers.
>
> However, if I am correct regardless of what you have specified in slaves,
> in this standalone mode there will not be any spark process spawned by the
> driver on the slaves. In all probability you will be running one
> spark-submit process on the driver node. You can see this through the
> output of
>
> jps|grep SparkSubmit
>
> and you will see the details by running jmonitor for that SparkSubmit job
>
> However, I still doubt whether Scheduling Across applications is feasible
> in standalone mode.
>
> The doc says
>
> *Standalone mode:* By default, applications submitted to the standalone
> mode cluster will run in FIFO (first-in-first-out) order, and each
> application will try to use *all available nodes*. You can limit the
> number of nodes an application uses by setting the spark.cores.max
> configuration property in it, or change the default for applications that
> don’t set this setting through spark.deploy.defaultCores. Finally, in
> addition to controlling cores, each application’s spark.executor.memory
> setting controls its memory use.
>
> It uses the word all available nodes but I am not convinced if it will use
> those nodes? Someone can possibly clarify this
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 May 2016 at 02:03, Mathieu Longtin <math...@closetwork.org> wrote:
>
>> Okay:
>> *host=my.local.server*
>> *port=someport*
>>
>> This is the spark-submit command, which runs on my local server:
>> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
>> --executor-memory 4g python-script.py with args*
>>
>> If I want 200 worker cores, I tell the cluster scheduler to run this
>> command on 200 cores:
>> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
>> spark://$host:$port *
>>
>> That's it. When the task starts, it uses all available workers. If for
>> some reason, not enough cores are available immediately, it still starts
>> processing with whatever it gets and the load will be spread further as
>> workers come online.
>>
>>
>> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> In a normal operation we tell spark which node the worker processes can
>>> run by adding the nodenames to conf/slaves.
>>>
>>> Not very clear on this in your case all the jobs run locally with say
>>> 100 executor cores like below:
>>>
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master local[*] \
>>>
>>>                 --driver-memory xg \  --default would be 512M
>>>
>>>                 --num-executors=1 \   -- This is the constraint in
>>> stand-alone Spark cluster, whether specified or not
>>>
>>>                 --executor-memory=xG \ --
>>>
>>>                 --executor-cores=n \
>>>
>>> --master local[*] means all cores and --executor-cores in your case need
>>> not be specified? or you can cap it like above --executor-cores=n. If
>>> it is not specified then the Spark app will go and grab every core.
>>> Although in practice that does not happen it is just an upper ceiling. It
>>> is FIFO.
>>>
>>> What typical executor memory is specified in your case?
>>>
>>> Do you have a  sample snapshot of spark-submit job by any chance Mathieu?
>>>
>>> Cheers
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 20 May 2016 at 00:27, Mathieu Longtin <math...@closetwork.org> wrote:
>>>
>>>> Mostly, the resource management is not up to the Spark master.
>>>>
>>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>>> quit when they are done. Then those processor cores can do something else
>>>> entirely, they are not reserved for Spark at all.
>>>>
>>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I
>>>>> guess in reality every user submits one job only.
>>>>>
>>>>> This is an interesting model for two reasons:
>>>>>
>>>>>
>>>>>    - It uses parallel processing across all the nodes or most of the
>>>>>    nodes to minimise the processing time
>>>>>    - it requires less intervention
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 21:33, Mathieu Longtin <math...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> Driver memory is default. Executor memory depends on job, the caller
>>>>>> decides how much memory to use. We don't specify --num-executors as we 
>>>>>> want
>>>>>> all cores assigned to the local master, since they were started by the
>>>>>> current user. No local executor.  --master=spark://localhost:someport. 1
>>>>>> core per executor.
>>>>>>
>>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Mathieu
>>>>>>>
>>>>>>> So it would be interesting to see what resources allocated in your
>>>>>>> case, especially the num-executors and executor-cores. I gather every 
>>>>>>> node
>>>>>>> has enough memory and cores.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>
>>>>>>>                 --master local[2] \
>>>>>>>
>>>>>>>                 --driver-memory 4g \
>>>>>>>
>>>>>>>                 --num-executors=1 \
>>>>>>>
>>>>>>>                 --executor-memory=4G \
>>>>>>>
>>>>>>>                 --executor-cores=2 \
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * 
>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The driver (the process started by spark-submit) runs locally. The
>>>>>>>> executors run on any of thousands of servers. So far, I haven't tried 
>>>>>>>> more
>>>>>>>> than 500 executors.
>>>>>>>>
>>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>>
>>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> ok so you are using some form of NFS mounted file system shared
>>>>>>>>> among the nodes and basically you start the processes through 
>>>>>>>>> spark-submit.
>>>>>>>>>
>>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>>> Spark. It does the management of resources so it is not clear to
>>>>>>>>> me what you are referring as worker manager here?
>>>>>>>>>
>>>>>>>>> This is my take from your model.
>>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>>> You only have one worker that lives within the driver JVM process.
>>>>>>>>> The Driver node runs on the same host that the cluster manager is
>>>>>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>>>>>> tasks. In this case there is only one executor for the Driver? The 
>>>>>>>>> Executor
>>>>>>>>> runs tasks for the Driver.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * 
>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> No master and no node manager, just the processes that do actual
>>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> We use the "stand alone" version because we have a shared file
>>>>>>>>>> system and a way of allocating computing resources already (Univa 
>>>>>>>>>> Grid
>>>>>>>>>> Engine). If an executor were to die, we have other ways of 
>>>>>>>>>> restarting it,
>>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>>
>>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Mathieu
>>>>>>>>>>>
>>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>>
>>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>>
>>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * 
>>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> First a bit of context:
>>>>>>>>>>>> We use Spark on a platform where each user start workers as
>>>>>>>>>>>> needed. This has the advantage that all permission management is 
>>>>>>>>>>>> handled by
>>>>>>>>>>>> the OS, so the users can only read files they have permission to.
>>>>>>>>>>>>
>>>>>>>>>>>> To do this, we have some utility that does the following:
>>>>>>>>>>>> - start a master
>>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>>> - the driver then talks to the master, tell it how many
>>>>>>>>>>>> executors it needs
>>>>>>>>>>>> - the master tell the worker nodes to start executors and talk
>>>>>>>>>>>> to the driver
>>>>>>>>>>>> - the executors are started
>>>>>>>>>>>>
>>>>>>>>>>>> From here on, the master doesn't do much, neither do the
>>>>>>>>>>>> process manager on the worker nodes.
>>>>>>>>>>>>
>>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>>> - Start the driver program
>>>>>>>>>>>> - Start executors on a number of servers, telling them where to
>>>>>>>>>>>> find the driver
>>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a way I could do this without the master and worker
>>>>>>>>>>>> managers?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>> Mathieu Longtin
>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>> Mathieu Longtin
>>>>>>>> 1-514-803-8977
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Re: Starting executor without a master

Reply via email to