"But if i increase only executor-cores the finish time is the same". More experienced ones can correct me, if I'm wrong, but as far as I understand that: one partition processed by one spark task. Task is always running on 1 core and not parallelized among cores. So if you have 5 partitions and you increased totall number of cores among cluster from 7 to 10 for example - you have not gained anything. But if you repartition you give an opportunity to process thing in more threads, so now more tasks can execute in parallel.
2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com>: > Hi, > > I think i don't understand enough how to launch jobs. > > I have one job which takes 60 seconds to finish. I run it with following > command: > > spark-submit --executor-cores 1 \ > --executor-memory 1g \ > --driver-memory 1g \ > --master yarn \ > --deploy-mode cluster \ > --conf spark.dynamicAllocation.enabled=true \ > --conf spark.shuffle.service.enabled=true \ > --conf spark.dynamicAllocation.minExecutors=1 \ > --conf spark.dynamicAllocation.maxExecutors=4 \ > --conf spark.dynamicAllocation.initialExecutors=4 \ > --conf spark.executor.instances=4 \ > > If i increase number of partitions from code and number of executors the app > will finish faster, which it's ok. But if i increase only executor-cores the > finish time is the same, and i don't understand why. I expect the time to be > lower than initial time. > > My second problem is if i launch twice above code i expect that both jobs to > finish in 60 seconds, but this don't happen. Both jobs finish after 120 > seconds and i don't understand why. > > I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 > threads). From what i saw in default EMR configurations, yarn is set on > FIFO(default) mode with CapacityScheduler. > > What do you think about this problems? > > Thanks, > > Cosmin > > -- *Sincerely yoursEgor Pakhomov*