I actually have the same problem, but I am not sure whether it is a spark
problem or a Yarn problem.
I set up a five nodes cluster on aws emr, start yarn daemon on the master
(The node manager will not be started on default on the master, I don't
want to waste any resource since I have to pay). And submit the spark task
through yarn-cluster mode. The command is:
./spark/bin/spark-submit --master yearn-cluster --num-executors 5
--exectutor-cores 4 --propertifies-file spark-application.conf myapp.py
But the yarn resource manager only created 4 containers on 4 nodes, and one
node was completely on idle.
More details about the setup:
EMR node:
m3.xlarge: 16g ram 4 cores 40g ssd (HDFS on EBS?)
Yarn-site.xml:
yarn.scheduler.maximum-allocation-mb=11520
yarn.nodemanager.resource.memory-mb=11520
Spark-conf:
spark.executor.memory 10g
spark.storage.memoryFraction 0.2
spark.python.worker.memory 1500mspark.akka.frameSize
200spark.shuffle.memoryFraction 0.1
spark.driver.memory 10g
Hadoop behavior observed:
Create 4 containers on four nodes including emr master but one emr slave on
idle (memory consumption around 2g and 0% cpu occupation)
Spark use one container for driver on emr slave node (make sense since I
required that much of memory)
Use the other three node for computing the tasks.
If yarn can't use all the nodes and I have to pay for the node, it's
just a big waste : p
Any thoughts on this?
Great thanks,
Ed
2015-05-18 12:07 GMT-04:00 Sandy Ryza <[email protected]>:
> *All
>
> On Mon, May 18, 2015 at 9:07 AM, Sandy Ryza <[email protected]>
> wrote:
>
>> Hi Xiaohe,
>>
>> The all Spark options must go before the jar or they won't take effect.
>>
>> -Sandy
>>
>> On Sun, May 17, 2015 at 8:59 AM, xiaohe lan <[email protected]>
>> wrote:
>>
>>> Sorry, them both are assigned task actually.
>>>
>>> Aggregated Metrics by Executor
>>> Executor IDAddressTask TimeTotal TasksFailed TasksSucceeded TasksInput
>>> Size / RecordsShuffle Write Size / RecordsShuffle Spill (Memory)Shuffle
>>> Spill (Disk)1host1:61841.7 min505640.0 MB / 12318400382.3 MB /
>>> 121007701630.4
>>> MB295.4 MB2host2:620721.7 min505640.0 MB / 12014510386.0 MB / 109269121646.6
>>> MB304.8 MB
>>>
>>> On Sun, May 17, 2015 at 11:50 PM, xiaohe lan <[email protected]>
>>> wrote:
>>>
>>>> bash-4.1$ ps aux | grep SparkSubmit
>>>> xilan 1704 13.2 1.2 5275520 380244 pts/0 Sl+ 08:39 0:13
>>>> /scratch/xilan/jdk1.8.0_45/bin/java -cp
>>>> /scratch/xilan/spark/conf:/scratch/xilan/spark/lib/spark-assembly-1.3.1-hadoop2.4.0.jar:/scratch/xilan/spark/lib/datanucleus-core-3.2.10.jar:/scratch/xilan/spark/lib/datanucleus-api-jdo-3.2.6.jar:/scratch/xilan/spark/lib/datanucleus-rdbms-3.2.9.jar:/scratch/xilan/hadoop/etc/hadoop
>>>> -Xms512m -Xmx512m org.apache.spark.deploy.SparkSubmit --master yarn
>>>> target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp
>>>> --num-executors 5 --executor-cores 4
>>>> xilan 1949 0.0 0.0 103292 800 pts/1 S+ 08:40 0:00 grep
>>>> --color SparkSubmit
>>>>
>>>>
>>>> When look at the sparkui, I see the following:
>>>> Aggregated Metrics by ExecutorExecutor IDAddressTask TimeTotal TasksFailed
>>>> TasksSucceeded TasksShuffle Read Size / Records1host1:304836 s101127.1
>>>> MB / 28089782host2:49970 ms00063.4 MB / 1810945
>>>>
>>>> So executor 2 is not even assigned a task ? Maybe I have some problems
>>>> in my setting, but I don't know what could be the possible settings I set
>>>> wrong or have not set.
>>>>
>>>>
>>>> Thanks,
>>>> Xiaohe
>>>>
>>>> On Sun, May 17, 2015 at 11:16 PM, Akhil Das <[email protected]
>>>> > wrote:
>>>>
>>>>> Did you try --executor-cores param? While you submit the job, do a ps
>>>>> aux | grep spark-submit and see the exact command parameters.
>>>>>
>>>>> Thanks
>>>>> Best Regards
>>>>>
>>>>> On Sat, May 16, 2015 at 12:31 PM, xiaohe lan <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a 5 nodes yarn cluster, I used spark-submit to submit a simple
>>>>>> app.
>>>>>>
>>>>>> spark-submit --master yarn
>>>>>> target/scala-2.10/simple-project_2.10-1.0.jar --class scala.SimpleApp
>>>>>> --num-executors 5
>>>>>>
>>>>>> I have set the number of executor to 5, but from sparkui I could see
>>>>>> only two executors and it ran very slow. What did I miss ?
>>>>>>
>>>>>> Thanks,
>>>>>> Xiaohe
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>