Re: Spark-YARN | Scheduling of containers

Hariharan Mon, 20 May 2019 07:05:09 -0700

It makes scheduling faster. If you have a node that can accommodate 20
containers, and you schedule one container per heartbeat, it would take 20
seconds to schedule all the containers. OTOH if you schedule multiple
containers to a heartbeat it is much faster.


- Hari

On Mon, 20 May 2019, 15:40 Akshay Bhardwaj, <akshay.bhardwaj1...@gmail.com>
wrote:

> Hi Hari,
>
> Thanks for this information.
>
> Do you have any resources on/can explain, why YARN has this as default
> behaviour? What would be the advantages/scenarios to have multiple
> assignments in single heartbeat?
>
>
> Regards
> Akshay Bhardwaj
> +91-97111-33849
>
>
> On Mon, May 20, 2019 at 1:29 PM Hariharan <hariharan...@gmail.com> wrote:
>
>> Hi Akshay,
>>
>> I believe HDP uses the capacity scheduler by default. In the capacity
>> scheduler, assignment of multiple containers on the same node is
>> determined by the option
>> yarn.scheduler.capacity.per-node-heartbeat.multiple-assignments-enabled,
>> which is true by default. If you would like YARN to spread out the
>> containers, you can set this for false.
>>
>> You can read learn about this and associated parameters here
>> -
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
>>
>> ~ Hari
>>
>>
>> On Mon, May 20, 2019 at 11:16 AM Akshay Bhardwaj
>> <akshay.bhardwaj1...@gmail.com> wrote:
>> >
>> > Hi All,
>> >
>> > Just floating this email again. Grateful for any suggestions.
>> >
>> > Akshay Bhardwaj
>> > +91-97111-33849
>> >
>> >
>> > On Mon, May 20, 2019 at 12:25 AM Akshay Bhardwaj <
>> akshay.bhardwaj1...@gmail.com> wrote:
>> >>
>> >> Hi All,
>> >>
>> >> I am running Spark 2.3 on YARN using HDP 2.6
>> >>
>> >> I am running spark job using dynamic resource allocation on YARN with
>> minimum 2 executors and maximum 6. My job read data from parquet files
>> which are present on S3 buckets and store some enriched data to cassandra.
>> >>
>> >> My question is, how does YARN decide which nodes to launch containers?
>> >> I have around 12 YARN nodes running in the cluster, but still i see
>> repeated patterns of 3-4 containers launched on the same node for a
>> particular job.
>> >>
>> >> What is the best way to start debugging this reason?
>> >>
>> >> Akshay Bhardwaj
>> >> +91-97111-33849
>>
>

Re: Spark-YARN | Scheduling of containers

Reply via email to