hmm maybe I spoke too soon. I have an apache zeppelin instance running and have configured it to use 48 cores (each node only has 16 cores), so I figured by setting it to 48, would mean that spark would grab 3 nodes. what happens instead though is that spark, reports that 48 cores are being used, but only executes everything on 1 node, it looks like it's not grabbing the extra nodes.
On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <a...@whisperstream.com> wrote: > That worked great, thanks Andrew. > > On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <and...@databricks.com> wrote: > >> Hi Axel, >> >> You can try setting `spark.deploy.spreadOut` to false (through your >> conf/spark-defaults.conf file). What this does is essentially try to >> schedule as many cores on one worker as possible before spilling over to >> other workers. Note that you *must* restart the cluster through the sbin >> scripts. >> >> For more information see: >> http://spark.apache.org/docs/latest/spark-standalone.html. >> >> Feel free to let me know whether it works, >> -Andrew >> >> >> 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.ber...@gmail.com>: >> >>> by default standalone creates 1 executor on every worker machine per >>> application >>> number of overall cores is configured with --total-executor-cores >>> so in general if you'll specify --total-executor-cores=1 then there >>> would be only 1 core on some executor and you'll get what you want >>> >>> on the other hand, if you application needs all cores of your cluster >>> and only some specific job should run on single executor there are few >>> methods to achieve this >>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition >>> >>> >>> On 18 August 2015 at 01:36, Axel Dahl <a...@whisperstream.com> wrote: >>> >>>> I have a 4 node cluster and have been playing around with the >>>> num-executors parameters, executor-memory and executor-cores >>>> >>>> I set the following: >>>> --executor-memory=10G >>>> --num-executors=1 >>>> --executor-cores=8 >>>> >>>> But when I run the job, I see that each worker, is running one executor >>>> which has 2 cores and 2.5G memory. >>>> >>>> What I'd like to do instead is have Spark just allocate the job to a >>>> single worker node? >>>> >>>> Is that possible in standalone mode or do I need a job/resource >>>> scheduler like Yarn to do that? >>>> >>>> Thanks in advance, >>>> >>>> -Axel >>>> >>>> >>>> >>> >> >