You should also check your memory usage. Let’s say for example you have 16 cores and 8 GB. And that you use 4 executors with 1 core each. When you use an executor, spark reserves it from yarn and yarn allocates the number of cores (e.g. 1 in our case) and the memory. The memory is actually more than you asked for. If you ask for 1GB it will in fact allocate almost 1.5GB with overhead. In addition, it will probably allocate an executor for the driver (probably with 1024MB memory usage). When you run your program and look in port 8080, you should look not only on the VCores used out of the VCores total but also on the Memory used and Memory total. You should also navigate to the executors (e.g. applications->running on the left and then choose you application and navigate all the way down to a single container). You can see there the actual usage.
BTW, it doesn’t matter how much memory your program wants but how much it reserves. In your example it will not take the 50MB of the test but the ~1.5GB (after overhead) per executor. Hope this helps, Assaf. From: Cosmin Posteuca [mailto:cosmin.poste...@gmail.com] Sent: Tuesday, February 14, 2017 9:53 AM To: Egor Pahomov Cc: user Subject: Re: [Spark Launcher] How to launch parallel jobs? Hi Egor, About the first problem i think you are right, it's make sense. About the second problem, i check available resource on 8088 port and there show 16 available cores. I start my job with 4 executors with 1 core each, and 1gb per executor. My job use maximum 50mb of memory(just for test). From my point of view the resources are enough, and the problem i think is from yarn configuration files, but i don't know what is missing. Thank you 2017-02-13 21:14 GMT+02:00 Egor Pahomov <pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>>: About second problem: I understand this can be in two cases: when one job prevents the other one from getting resources for executors or (2) bottleneck is reading from disk, so you can not really parallel that. I have no experience with second case, but it's easy to verify the fist one: just look on you hadoop UI and verify, that both job get enough resources. 2017-02-13 11:07 GMT-08:00 Egor Pahomov <pahomov.e...@gmail.com<mailto:pahomov.e...@gmail.com>>: "But if i increase only executor-cores the finish time is the same". More experienced ones can correct me, if I'm wrong, but as far as I understand that: one partition processed by one spark task. Task is always running on 1 core and not parallelized among cores. So if you have 5 partitions and you increased totall number of cores among cluster from 7 to 10 for example - you have not gained anything. But if you repartition you give an opportunity to process thing in more threads, so now more tasks can execute in parallel. 2017-02-13 7:05 GMT-08:00 Cosmin Posteuca <cosmin.poste...@gmail.com<mailto:cosmin.poste...@gmail.com>>: Hi, I think i don't understand enough how to launch jobs. I have one job which takes 60 seconds to finish. I run it with following command: spark-submit --executor-cores 1 \ --executor-memory 1g \ --driver-memory 1g \ --master yarn \ --deploy-mode cluster \ --conf spark.dynamicAllocation.enabled=true \ --conf spark.shuffle.service.enabled=true \ --conf spark.dynamicAllocation.minExecutors=1 \ --conf spark.dynamicAllocation.maxExecutors=4 \ --conf spark.dynamicAllocation.initialExecutors=4 \ --conf spark.executor.instances=4 \ If i increase number of partitions from code and number of executors the app will finish faster, which it's ok. But if i increase only executor-cores the finish time is the same, and i don't understand why. I expect the time to be lower than initial time. My second problem is if i launch twice above code i expect that both jobs to finish in 60 seconds, but this don't happen. Both jobs finish after 120 seconds and i don't understand why. I run this code on AWS EMR, on 2 instances(4 cpu each, and each cpu has 2 threads). From what i saw in default EMR configurations, yarn is set on FIFO(default) mode with CapacityScheduler. What do you think about this problems? Thanks, Cosmin -- Sincerely yours Egor Pakhomov -- Sincerely yours Egor Pakhomov