RE: [Spark Launcher] How to launch parallel jobs?

Mendelson, Assaf Tue, 14 Feb 2017 00:03:59 -0800

You should also check your memory usage.
Let’s say for example you have 16 cores and 8 GB. And that you use 4 executors 
with 1 core each.
When you use an executor, spark reserves it from yarn and yarn allocates the 
number of cores (e.g. 1 in our case) and the memory. The memory is actually 
more than you asked for. If you ask for 1GB it will in fact allocate almost 
1.5GB with overhead. In addition, it will probably allocate an executor for the 
driver (probably with 1024MB memory usage).
When you run your program and look in port 8080, you should look not only on 
the VCores used out of the VCores total but also on the Memory used and Memory 
total. You should also navigate to the executors (e.g. applications->running on 
the left and then choose you application and navigate all the way down to a 
single container). You can see there the actual usage.

BTW, it doesn’t matter how much memory your program wants but how much it 
reserves. In your example it will not take the 50MB of the test but the ~1.5GB 
(after overhead) per executor.
Hope this helps,
Assaf.

From: Cosmin Posteuca [mailto:cosmin.poste...@gmail.com]
Sent: Tuesday, February 14, 2017 9:53 AM
To: Egor Pahomov
Cc: user
Subject: Re: [Spark Launcher] How to launch parallel jobs?

Hi Egor,

About the first problem i think you are right, it's make sense.

About the second problem, i check available resource on 8088 port and there 
show 16 available cores. I start my job with 4 executors with 1 core each, and 
1gb per executor. My job use maximum 50mb of memory(just for test). From my 
point of view the resources are enough, and the problem i think is from yarn 
configuration files, but i don't know what is missing.

Thank you

2017-02-13 21:14 GMT+02:00 Egor Pahomov 
<pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>>:
About second problem: I understand this can be in two cases: when one job 
prevents the other one from getting resources for executors or (2) bottleneck 
is reading from disk, so you can not really parallel that. I have no experience 
with second case, but it's easy to verify the fist one: just look on you hadoop 
UI and verify, that both job get enough resources.

2017-02-13 11:07 GMT-08:00 Egor Pahomov 
<pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>>:
"But if i increase only executor-cores the finish time is the same". More 
experienced ones can correct me, if I'm wrong, but as far as I understand that: 
one partition processed by one spark task. Task is always running on 1 core and 
not parallelized among cores. So if you have 5 partitions and you increased 
totall number of cores among cluster from 7 to 10 for example - you have not 
gained anything. But if you repartition you give an opportunity to process 
thing in more threads, so now more tasks can execute in parallel.

2017-02-13 7:05 GMT-08:00 Cosmin Posteuca 
<cosmin.poste...@gmail.com<mailto:cosmin.poste...@gmail.com>>:
Hi,

I think i don't understand enough how to launch jobs.

I have one job which takes 60 seconds to finish. I run it with following 
command:

spark-submit --executor-cores 1 \

             --executor-memory 1g \

             --driver-memory 1g \

             --master yarn \

             --deploy-mode cluster \

             --conf spark.dynamicAllocation.enabled=true \

             --conf spark.shuffle.service.enabled=true \

             --conf spark.dynamicAllocation.minExecutors=1 \

             --conf spark.dynamicAllocation.maxExecutors=4 \

             --conf spark.dynamicAllocation.initialExecutors=4 \

             --conf spark.executor.instances=4 \

If i increase number of partitions from code and number of executors the app 
will finish faster, which it's ok. But if i increase only executor-cores the 
finish time is the same, and i don't understand why. I expect the time to be 
lower than initial time.

My second problem is if i launch twice above code i expect that both jobs to 
finish in 60 seconds, but this don't happen. Both jobs finish after 120 seconds 
and i don't understand why.

I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 
threads). From what i saw in default EMR configurations, yarn is set on 
FIFO(default) mode with CapacityScheduler.

What do you think about this problems?

Thanks,

Cosmin

--
Sincerely yours
Egor Pakhomov

--
Sincerely yours
Egor Pakhomov

RE: [Spark Launcher] How to launch parallel jobs?

Reply via email to