Hi Axel, what spark version are you using? Also, what do your configurations look like for the following?
- spark.cores.max (also --total-executor-cores) - spark.executor.cores (also --executor-cores) 2015-08-19 9:27 GMT-07:00 Axel Dahl <a...@whisperstream.com>: > hmm maybe I spoke too soon. > > I have an apache zeppelin instance running and have configured it to use > 48 cores (each node only has 16 cores), so I figured by setting it to 48, > would mean that spark would grab 3 nodes. what happens instead though is > that spark, reports that 48 cores are being used, but only executes > everything on 1 node, it looks like it's not grabbing the extra nodes. > > On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <a...@whisperstream.com> wrote: > >> That worked great, thanks Andrew. >> >> On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <and...@databricks.com> wrote: >> >>> Hi Axel, >>> >>> You can try setting `spark.deploy.spreadOut` to false (through your >>> conf/spark-defaults.conf file). What this does is essentially try to >>> schedule as many cores on one worker as possible before spilling over to >>> other workers. Note that you *must* restart the cluster through the sbin >>> scripts. >>> >>> For more information see: >>> http://spark.apache.org/docs/latest/spark-standalone.html. >>> >>> Feel free to let me know whether it works, >>> -Andrew >>> >>> >>> 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.ber...@gmail.com>: >>> >>>> by default standalone creates 1 executor on every worker machine per >>>> application >>>> number of overall cores is configured with --total-executor-cores >>>> so in general if you'll specify --total-executor-cores=1 then there >>>> would be only 1 core on some executor and you'll get what you want >>>> >>>> on the other hand, if you application needs all cores of your cluster >>>> and only some specific job should run on single executor there are few >>>> methods to achieve this >>>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition >>>> >>>> >>>> On 18 August 2015 at 01:36, Axel Dahl <a...@whisperstream.com> wrote: >>>> >>>>> I have a 4 node cluster and have been playing around with the >>>>> num-executors parameters, executor-memory and executor-cores >>>>> >>>>> I set the following: >>>>> --executor-memory=10G >>>>> --num-executors=1 >>>>> --executor-cores=8 >>>>> >>>>> But when I run the job, I see that each worker, is running one >>>>> executor which has 2 cores and 2.5G memory. >>>>> >>>>> What I'd like to do instead is have Spark just allocate the job to a >>>>> single worker node? >>>>> >>>>> Is that possible in standalone mode or do I need a job/resource >>>>> scheduler like Yarn to do that? >>>>> >>>>> Thanks in advance, >>>>> >>>>> -Axel >>>>> >>>>> >>>>> >>>> >>> >> >