Re: Spark-YARN | Scheduling of containers

Hariharan Mon, 20 May 2019 01:00:54 -0700

Hi Akshay,

I believe HDP uses the capacity scheduler by default. In the capacity
scheduler, assignment of multiple containers on the same node is
determined by the option
yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled,
which is true by default. If you would like YARN to spread out the
containers, you can set this for false.


You can read learn about this and associated parameters here
-https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

~ Hari


On Mon, May 20, 2019 at 11:16 AM Akshay Bhardwaj
<akshay.bhardwaj1...@gmail.com> wrote:
>
> Hi All,
>
> Just floating this email again. Grateful for any suggestions.
>
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj 
> <akshay.bhardwaj1...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I am running Spark 2.3 on YARN using HDP 2.6
>>
>> I am running spark job using dynamic resource allocation on YARN with 
>> minimum 2 executors and maximum 6. My job read data from parquet files which 
>> are present on S3 buckets and store some enriched data to cassandra.
>>
>> My question is, how does YARN decide which nodes to launch containers?
>> I have around 12 YARN nodes running in the cluster, but still i see repeated 
>> patterns of 3-4 containers launched on the same node for a particular job.
>>
>> What is the best way to start debugging this reason?
>>
>> Akshay Bhardwaj
>> +91-97111-33849

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark-YARN | Scheduling of containers

Reply via email to