Correct, what I do to start workers is the equivalent of start-slaves.sh. It ends up running the same command on the worker servers as start-slaves does.
It definitively uses all workers, and workers starting later pick up work as well. If you have a long running job, you can add workers dynamically and they will pick up work as long as there are enough partitions to go around. I set spark.locality.wait to 0 so that workers never wait to pick up tasks. On Fri, May 20, 2016 at 2:57 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > OK this is basically form my notes for Spark standalone. Worker process is > the slave process > > [image: Inline images 2] > > > > You start worker as you showed > > $SPARK_HOME/sbin/start-slaves.sh > Now that picks up the worker host node names from $SPARK_HOME/conf/slaves > files. So you still have to tell Spark where to run workers. > > However, if I am correct regardless of what you have specified in slaves, > in this standalone mode there will not be any spark process spawned by the > driver on the slaves. In all probability you will be running one > spark-submit process on the driver node. You can see this through the > output of > > jps|grep SparkSubmit > > and you will see the details by running jmonitor for that SparkSubmit job > > However, I still doubt whether Scheduling Across applications is feasible > in standalone mode. > > The doc says > > *Standalone mode:* By default, applications submitted to the standalone > mode cluster will run in FIFO (first-in-first-out) order, and each > application will try to use *all available nodes*. You can limit the > number of nodes an application uses by setting the spark.cores.max > configuration property in it, or change the default for applications that > don’t set this setting through spark.deploy.defaultCores. Finally, in > addition to controlling cores, each application’s spark.executor.memory > setting controls its memory use. > > It uses the word all available nodes but I am not convinced if it will use > those nodes? Someone can possibly clarify this > > HTH > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 20 May 2016 at 02:03, Mathieu Longtin <math...@closetwork.org> wrote: > >> Okay: >> *host=my.local.server* >> *port=someport* >> >> This is the spark-submit command, which runs on my local server: >> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port >> --executor-memory 4g python-script.py with args* >> >> If I want 200 worker cores, I tell the cluster scheduler to run this >> command on 200 cores: >> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g >> spark://$host:$port * >> >> That's it. When the task starts, it uses all available workers. If for >> some reason, not enough cores are available immediately, it still starts >> processing with whatever it gets and the load will be spread further as >> workers come online. >> >> >> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> In a normal operation we tell spark which node the worker processes can >>> run by adding the nodenames to conf/slaves. >>> >>> Not very clear on this in your case all the jobs run locally with say >>> 100 executor cores like below: >>> >>> >>> ${SPARK_HOME}/bin/spark-submit \ >>> >>> --master local[*] \ >>> >>> --driver-memory xg \ --default would be 512M >>> >>> --num-executors=1 \ -- This is the constraint in >>> stand-alone Spark cluster, whether specified or not >>> >>> --executor-memory=xG \ -- >>> >>> --executor-cores=n \ >>> >>> --master local[*] means all cores and --executor-cores in your case need >>> not be specified? or you can cap it like above --executor-cores=n. If >>> it is not specified then the Spark app will go and grab every core. >>> Although in practice that does not happen it is just an upper ceiling. It >>> is FIFO. >>> >>> What typical executor memory is specified in your case? >>> >>> Do you have a sample snapshot of spark-submit job by any chance Mathieu? >>> >>> Cheers >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 20 May 2016 at 00:27, Mathieu Longtin <math...@closetwork.org> wrote: >>> >>>> Mostly, the resource management is not up to the Spark master. >>>> >>>> We routinely start 100 executor-cores for 5 minute job, and they just >>>> quit when they are done. Then those processor cores can do something else >>>> entirely, they are not reserved for Spark at all. >>>> >>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Then in theory every user can fire multiple spark-submit jobs. do you >>>>> cap it with settings in $SPARK_HOME/conf/spark-defaults.conf , but I >>>>> guess in reality every user submits one job only. >>>>> >>>>> This is an interesting model for two reasons: >>>>> >>>>> >>>>> - It uses parallel processing across all the nodes or most of the >>>>> nodes to minimise the processing time >>>>> - it requires less intervention >>>>> >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 19 May 2016 at 21:33, Mathieu Longtin <math...@closetwork.org> >>>>> wrote: >>>>> >>>>>> Driver memory is default. Executor memory depends on job, the caller >>>>>> decides how much memory to use. We don't specify --num-executors as we >>>>>> want >>>>>> all cores assigned to the local master, since they were started by the >>>>>> current user. No local executor. --master=spark://localhost:someport. 1 >>>>>> core per executor. >>>>>> >>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Thanks Mathieu >>>>>>> >>>>>>> So it would be interesting to see what resources allocated in your >>>>>>> case, especially the num-executors and executor-cores. I gather every >>>>>>> node >>>>>>> has enough memory and cores. >>>>>>> >>>>>>> >>>>>>> >>>>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>>>> >>>>>>> --master local[2] \ >>>>>>> >>>>>>> --driver-memory 4g \ >>>>>>> >>>>>>> --num-executors=1 \ >>>>>>> >>>>>>> --executor-memory=4G \ >>>>>>> >>>>>>> --executor-cores=2 \ >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> >>>>>>> wrote: >>>>>>> >>>>>>>> The driver (the process started by spark-submit) runs locally. The >>>>>>>> executors run on any of thousands of servers. So far, I haven't tried >>>>>>>> more >>>>>>>> than 500 executors. >>>>>>>> >>>>>>>> Right now, I run a master on the same server as the driver. >>>>>>>> >>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> ok so you are using some form of NFS mounted file system shared >>>>>>>>> among the nodes and basically you start the processes through >>>>>>>>> spark-submit. >>>>>>>>> >>>>>>>>> In Stand-alone mode, a simple cluster manager included with >>>>>>>>> Spark. It does the management of resources so it is not clear to >>>>>>>>> me what you are referring as worker manager here? >>>>>>>>> >>>>>>>>> This is my take from your model. >>>>>>>>> The application will go and grab all the cores in the cluster. >>>>>>>>> You only have one worker that lives within the driver JVM process. >>>>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>>>> tasks. In this case there is only one executor for the Driver? The >>>>>>>>> Executor >>>>>>>>> runs tasks for the Driver. >>>>>>>>> >>>>>>>>> >>>>>>>>> HTH >>>>>>>>> >>>>>>>>> Dr Mich Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> LinkedIn * >>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> No master and no node manager, just the processes that do actual >>>>>>>>>> work. >>>>>>>>>> >>>>>>>>>> We use the "stand alone" version because we have a shared file >>>>>>>>>> system and a way of allocating computing resources already (Univa >>>>>>>>>> Grid >>>>>>>>>> Engine). If an executor were to die, we have other ways of >>>>>>>>>> restarting it, >>>>>>>>>> we don't need the worker manager to deal with it. >>>>>>>>>> >>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh < >>>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Mathieu >>>>>>>>>>> >>>>>>>>>>> What does this approach provide that the norm lacks? >>>>>>>>>>> >>>>>>>>>>> So basically each node has its master in this model. >>>>>>>>>>> >>>>>>>>>>> Are these supposed to be individual stand alone servers? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Dr Mich Talebzadeh >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> LinkedIn * >>>>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org >>>>>>>>>>> > wrote: >>>>>>>>>>> >>>>>>>>>>>> First a bit of context: >>>>>>>>>>>> We use Spark on a platform where each user start workers as >>>>>>>>>>>> needed. This has the advantage that all permission management is >>>>>>>>>>>> handled by >>>>>>>>>>>> the OS, so the users can only read files they have permission to. >>>>>>>>>>>> >>>>>>>>>>>> To do this, we have some utility that does the following: >>>>>>>>>>>> - start a master >>>>>>>>>>>> - start worker managers on a number of servers >>>>>>>>>>>> - "submit" the Spark driver program >>>>>>>>>>>> - the driver then talks to the master, tell it how many >>>>>>>>>>>> executors it needs >>>>>>>>>>>> - the master tell the worker nodes to start executors and talk >>>>>>>>>>>> to the driver >>>>>>>>>>>> - the executors are started >>>>>>>>>>>> >>>>>>>>>>>> From here on, the master doesn't do much, neither do the >>>>>>>>>>>> process manager on the worker nodes. >>>>>>>>>>>> >>>>>>>>>>>> What I would like to do is simplify this to: >>>>>>>>>>>> - Start the driver program >>>>>>>>>>>> - Start executors on a number of servers, telling them where to >>>>>>>>>>>> find the driver >>>>>>>>>>>> - The executors connect directly to the driver >>>>>>>>>>>> >>>>>>>>>>>> Is there a way I could do this without the master and worker >>>>>>>>>>>> managers? >>>>>>>>>>>> >>>>>>>>>>>> Thanks! >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Mathieu Longtin >>>>>>>>>>>> 1-514-803-8977 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>> Mathieu Longtin >>>>>>>>>> 1-514-803-8977 >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> Mathieu Longtin >>>>>>>> 1-514-803-8977 >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> Mathieu Longtin >>>>>> 1-514-803-8977 >>>>>> >>>>> >>>>> -- >>>> Mathieu Longtin >>>> 1-514-803-8977 >>>> >>> >>> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > -- Mathieu Longtin 1-514-803-8977