Hi,

If your YARN cluster uses fair scheduler, maybe you can check if the 
yarn.scheduler.fair.assignmultiple<https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>
 config is set. If that’s the case, then adjusting 
yarn.scheduler.fair.dynamic.max.assign and yarn.scheduler.fair.max.assign could 
be helpful. Also, AFAIK, flink does not exert extra control of distribution of 
yarn apps on different nodes. The key diff between flink and spark is that most 
flink jobs are unbounded while spark jobs are bounded. It is possible that 
under same YARN scheduling strategy, the final distribution of apps after some 
time is different.

Best,
Biao Geng

From: Lu Niu <qqib...@gmail.com>
Date: Thursday, September 7, 2023 at 12:17 AM
To: Geng Biao <biaoge...@gmail.com>
Cc: Chen Zhanghao <zhanghao.c...@outlook.com>, Weihua Hu 
<huweihua....@gmail.com>, Kenan Kılıçtepe <kkilict...@gmail.com>, user 
<user@flink.apache.org>
Subject: Re: Uneven TM Distribution of Flink on YARN
Hi, Thanks for all your help. Are there any other insights?


Best
Lu

On Wed, Aug 30, 2023 at 11:29 AM Lu Niu 
<qqib...@gmail.com<mailto:qqib...@gmail.com>> wrote:
No. we don't use yarn.taskmanager.node-label

Best
Lu

On Tue, Aug 29, 2023 at 12:17 AM Geng Biao 
<biaoge...@gmail.com<mailto:biaoge...@gmail.com>> wrote:
Maybe you can check if you have set yarn.taskmanager.node-label for some flink 
jobs?

Best,
Biao Geng

发送自 Outlook for iOS<https://aka.ms/o0ukef>
________________________________
发件人: Chen Zhanghao <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>>
发送时间: Tuesday, August 29, 2023 12:14:53 PM
收件人: Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>>; Weihua Hu 
<huweihua....@gmail.com<mailto:huweihua....@gmail.com>>
抄送: Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>>; user 
<user@flink.apache.org<mailto:user@flink.apache.org>>
主题: Re: Uneven TM Distribution of Flink on YARN

CCing @Weihua Hu<mailto:huweihua....@gmail.com> , who is an expert on this. Do 
you have any ideas on the phenomenon here?

Best,
Zhanghao Chen
________________________________
From: Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>>
Sent: Tuesday, August 29, 2023 12:11:35 PM
To: Chen Zhanghao <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>>
Cc: Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>>; user 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Re: Uneven TM Distribution of Flink on YARN

Thanks for your reply.

The interesting fact is that we also managed spark on yarn. However. Only the 
flink cluster is having the issue. I am wondering whether there is a difference 
in the implementation on flink side.

Best
Lu

On Mon, Aug 28, 2023 at 8:38 PM Chen Zhanghao 
<zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote:
Hi Lu Niu,

TM distribution on YARN nodes is managed by YARN RM and is out of the scope of 
Flink. On the other hand, cluster.evenly-spread-out-slots forces even 
distribution of tasks among Flink TMs, and has nothing to do with your 
concerns. Also, the config currently only supports Standalone mode Flink 
clusters, and does not take effect on a Flink cluster on YARN.

Best,
Zhanghao Chen
________________________________
发件人: Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>>
发送时间: 2023年8月29日 4:30
收件人: Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>>
抄送: user <user@flink.apache.org<mailto:user@flink.apache.org>>
主题: Re: Uneven TM Distribution of Flink on YARN

Thanks for the reply. We've already set cluster.evenly-spread-out-slots = true

Best
Lu

On Mon, Aug 28, 2023 at 1:23 PM Kenan Kılıçtepe 
<kkilict...@gmail.com<mailto:kkilict...@gmail.com>> wrote:
Have you checked config param cluster.evenly-spread-out-slots ?


On Mon, Aug 28, 2023 at 10:31 PM Lu Niu 
<qqib...@gmail.com<mailto:qqib...@gmail.com>> wrote:
Hi, Flink users

We have recently observed that the allocation of Flink TaskManagers in our YARN 
cluster is not evenly distributed. We would like to hear your thoughts on this 
matter.

1. Our setup includes Flink version 1.15.1 and Hadoop 2.10.0.
2. The uneven distribution is that out of a 370-node YARN cluster, there are 16 
nodes with either 0 or 1 vCore available, while 110 nodes have more than 10 
vCores available.

Is such behavior expected? If not, is there a fix provided in Flink? Thanks!

Best
Lu

Reply via email to