Okay: *host=my.local.server* *port=someport* This is the spark-submit command, which runs on my local server: *$SPARK_HOME/bin/spark-submit --master spark://$host:$port --executor-memory 4g python-script.py with args*
If I want 200 worker cores, I tell the cluster scheduler to run this command on 200 cores: *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g spark://$host:$port * That's it. When the task starts, it uses all available workers. If for some reason, not enough cores are available immediately, it still starts processing with whatever it gets and the load will be spread further as workers come online. On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > In a normal operation we tell spark which node the worker processes can > run by adding the nodenames to conf/slaves. > > Not very clear on this in your case all the jobs run locally with say 100 > executor cores like below: > > > ${SPARK_HOME}/bin/spark-submit \ > > --master local[*] \ > > --driver-memory xg \ --default would be 512M > > --num-executors=1 \ -- This is the constraint in > stand-alone Spark cluster, whether specified or not > > --executor-memory=xG \ -- > > --executor-cores=n \ > > --master local[*] means all cores and --executor-cores in your case need > not be specified? or you can cap it like above --executor-cores=n. If it > is not specified then the Spark app will go and grab every core. Although > in practice that does not happen it is just an upper ceiling. It is FIFO. > > What typical executor memory is specified in your case? > > Do you have a sample snapshot of spark-submit job by any chance Mathieu? > > Cheers > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 20 May 2016 at 00:27, Mathieu Longtin <math...@closetwork.org> wrote: > >> Mostly, the resource management is not up to the Spark master. >> >> We routinely start 100 executor-cores for 5 minute job, and they just >> quit when they are done. Then those processor cores can do something else >> entirely, they are not reserved for Spark at all. >> >> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> Then in theory every user can fire multiple spark-submit jobs. do you >>> cap it with settings in $SPARK_HOME/conf/spark-defaults.conf , but I >>> guess in reality every user submits one job only. >>> >>> This is an interesting model for two reasons: >>> >>> >>> - It uses parallel processing across all the nodes or most of the >>> nodes to minimise the processing time >>> - it requires less intervention >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 19 May 2016 at 21:33, Mathieu Longtin <math...@closetwork.org> wrote: >>> >>>> Driver memory is default. Executor memory depends on job, the caller >>>> decides how much memory to use. We don't specify --num-executors as we want >>>> all cores assigned to the local master, since they were started by the >>>> current user. No local executor. --master=spark://localhost:someport. 1 >>>> core per executor. >>>> >>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Thanks Mathieu >>>>> >>>>> So it would be interesting to see what resources allocated in your >>>>> case, especially the num-executors and executor-cores. I gather every node >>>>> has enough memory and cores. >>>>> >>>>> >>>>> >>>>> ${SPARK_HOME}/bin/spark-submit \ >>>>> >>>>> --master local[2] \ >>>>> >>>>> --driver-memory 4g \ >>>>> >>>>> --num-executors=1 \ >>>>> >>>>> --executor-memory=4G \ >>>>> >>>>> --executor-cores=2 \ >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> >>>>> wrote: >>>>> >>>>>> The driver (the process started by spark-submit) runs locally. The >>>>>> executors run on any of thousands of servers. So far, I haven't tried >>>>>> more >>>>>> than 500 executors. >>>>>> >>>>>> Right now, I run a master on the same server as the driver. >>>>>> >>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> ok so you are using some form of NFS mounted file system shared >>>>>>> among the nodes and basically you start the processes through >>>>>>> spark-submit. >>>>>>> >>>>>>> In Stand-alone mode, a simple cluster manager included with Spark. It >>>>>>> does the management of resources so it is not clear to me what you are >>>>>>> referring as worker manager here? >>>>>>> >>>>>>> This is my take from your model. >>>>>>> The application will go and grab all the cores in the cluster. >>>>>>> You only have one worker that lives within the driver JVM process. >>>>>>> The Driver node runs on the same host that the cluster manager is >>>>>>> running. The Driver requests the Cluster Manager for resources to run >>>>>>> tasks. In this case there is only one executor for the Driver? The >>>>>>> Executor >>>>>>> runs tasks for the Driver. >>>>>>> >>>>>>> >>>>>>> HTH >>>>>>> >>>>>>> Dr Mich Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> LinkedIn * >>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://talebzadehmich.wordpress.com >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> >>>>>>> wrote: >>>>>>> >>>>>>>> No master and no node manager, just the processes that do actual >>>>>>>> work. >>>>>>>> >>>>>>>> We use the "stand alone" version because we have a shared file >>>>>>>> system and a way of allocating computing resources already (Univa Grid >>>>>>>> Engine). If an executor were to die, we have other ways of restarting >>>>>>>> it, >>>>>>>> we don't need the worker manager to deal with it. >>>>>>>> >>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Mathieu >>>>>>>>> >>>>>>>>> What does this approach provide that the norm lacks? >>>>>>>>> >>>>>>>>> So basically each node has its master in this model. >>>>>>>>> >>>>>>>>> Are these supposed to be individual stand alone servers? >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> Dr Mich Talebzadeh >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> LinkedIn * >>>>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> http://talebzadehmich.wordpress.com >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> First a bit of context: >>>>>>>>>> We use Spark on a platform where each user start workers as >>>>>>>>>> needed. This has the advantage that all permission management is >>>>>>>>>> handled by >>>>>>>>>> the OS, so the users can only read files they have permission to. >>>>>>>>>> >>>>>>>>>> To do this, we have some utility that does the following: >>>>>>>>>> - start a master >>>>>>>>>> - start worker managers on a number of servers >>>>>>>>>> - "submit" the Spark driver program >>>>>>>>>> - the driver then talks to the master, tell it how many executors >>>>>>>>>> it needs >>>>>>>>>> - the master tell the worker nodes to start executors and talk to >>>>>>>>>> the driver >>>>>>>>>> - the executors are started >>>>>>>>>> >>>>>>>>>> From here on, the master doesn't do much, neither do the process >>>>>>>>>> manager on the worker nodes. >>>>>>>>>> >>>>>>>>>> What I would like to do is simplify this to: >>>>>>>>>> - Start the driver program >>>>>>>>>> - Start executors on a number of servers, telling them where to >>>>>>>>>> find the driver >>>>>>>>>> - The executors connect directly to the driver >>>>>>>>>> >>>>>>>>>> Is there a way I could do this without the master and worker >>>>>>>>>> managers? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Mathieu Longtin >>>>>>>>>> 1-514-803-8977 >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> Mathieu Longtin >>>>>>>> 1-514-803-8977 >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> Mathieu Longtin >>>>>> 1-514-803-8977 >>>>>> >>>>> >>>>> -- >>>> Mathieu Longtin >>>> 1-514-803-8977 >>>> >>> >>> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > -- Mathieu Longtin 1-514-803-8977