Re: [Spark Launcher] How to launch parallel jobs?

Egor Pahomov Mon, 13 Feb 2017 11:07:56 -0800

"But if i increase only executor-cores the finish time is the same". More
experienced ones can correct me, if I'm wrong, but as far as I understand
that: one partition processed by one spark task. Task is always running on
1 core and not parallelized among cores. So if you have 5 partitions and
you increased totall number of cores among cluster from 7 to 10 for example
- you have not gained anything. But if you repartition you give an
opportunity to process thing in more threads, so now more tasks can execute
in parallel.


2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com>:

> Hi,
>
> I think i don't understand enough how to launch jobs.
>
> I have one job which takes 60 seconds to finish. I run it with following
> command:
>
> spark-submit --executor-cores 1 \
>              --executor-memory 1g \
>              --driver-memory 1g \
>              --master yarn \
>              --deploy-mode cluster \
>              --conf spark.dynamicAllocation.enabled=true \
>              --conf spark.shuffle.service.enabled=true \
>              --conf spark.dynamicAllocation.minExecutors=1 \
>              --conf spark.dynamicAllocation.maxExecutors=4 \
>              --conf spark.dynamicAllocation.initialExecutors=4 \
>              --conf spark.executor.instances=4 \
>
> If i increase number of partitions from code and number of executors the app 
> will finish faster, which it's ok. But if i increase only executor-cores the 
> finish time is the same, and i don't understand why. I expect the time to be 
> lower than initial time.
>
> My second problem is if i launch twice above code i expect that both jobs to 
> finish in 60 seconds, but this don't happen. Both jobs finish after 120 
> seconds and i don't understand why.
>
> I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 
> threads). From what i saw in default EMR configurations, yarn is set on 
> FIFO(default) mode with CapacityScheduler.
>
> What do you think about this problems?
>
> Thanks,
>
> Cosmin
>
>


-- 


*Sincerely yoursEgor Pakhomov*

Re: [Spark Launcher] How to launch parallel jobs?

Reply via email to