Re: Uneven TM Distribution of Flink on YARN

Lu Niu Wed, 06 Sep 2023 11:36:49 -0700

Thanks, I'll check it out!

Best
Lu


On Wed, Sep 6, 2023 at 10:09 AM Biao Geng <biaoge...@gmail.com> wrote:

> Hi,
>
>
>
> If your YARN cluster uses fair scheduler, maybe you can check if the
> yarn.scheduler.fair.assignmultiple
> <https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html>
> config is set. If that’s the case, then adjusting
> yarn.scheduler.fair.dynamic.max.assign and yarn.scheduler.fair.max.assign
> could be helpful. Also, AFAIK, flink does not exert extra control of
> distribution of yarn apps on different nodes. The key diff between flink
> and spark is that most flink jobs are unbounded while spark jobs are
> bounded. It is possible that under same YARN scheduling strategy, the final
> distribution of apps after some time is different.
>
>
>
> Best,
>
> Biao Geng
>
>
>
> *From: *Lu Niu <qqib...@gmail.com>
> *Date: *Thursday, September 7, 2023 at 12:17 AM
> *To: *Geng Biao <biaoge...@gmail.com>
> *Cc: *Chen Zhanghao <zhanghao.c...@outlook.com>, Weihua Hu <
> huweihua....@gmail.com>, Kenan Kılıçtepe <kkilict...@gmail.com>, user <
> user@flink.apache.org>
> *Subject: *Re: Uneven TM Distribution of Flink on YARN
>
> Hi, Thanks for all your help. Are there any other insights?
>
>
>
>
>
> Best
>
> Lu
>
>
>
> On Wed, Aug 30, 2023 at 11:29 AM Lu Niu <qqib...@gmail.com> wrote:
>
> No. we don't use yarn.taskmanager.node-label
>
>
>
> Best
>
> Lu
>
>
>
> On Tue, Aug 29, 2023 at 12:17 AM Geng Biao <biaoge...@gmail.com> wrote:
>
> Maybe you can check if you have set yarn.taskmanager.node-label for some
> flink jobs?
>
>
>
> Best,
>
> Biao Geng
>
>
>
> 发送自 Outlook for iOS <https://aka.ms/o0ukef>
> ------------------------------
>
> *发件人**:* Chen Zhanghao <zhanghao.c...@outlook.com>
> *发送时间**:* Tuesday, August 29, 2023 12:14:53 PM
> *收件人**:* Lu Niu <qqib...@gmail.com>; Weihua Hu <huweihua....@gmail.com>
> *抄送**:* Kenan Kılıçtepe <kkilict...@gmail.com>; user <
> user@flink.apache.org>
> *主题**:* Re: Uneven TM Distribution of Flink on YARN
>
>
>
> CCing @Weihua Hu <huweihua....@gmail.com> , who is an expert on this. Do
> you have any ideas on the phenomenon here?
>
>
>
> Best,
>
> Zhanghao Chen
> ------------------------------
>
> *From:* Lu Niu <qqib...@gmail.com>
> *Sent:* Tuesday, August 29, 2023 12:11:35 PM
> *To:* Chen Zhanghao <zhanghao.c...@outlook.com>
> *Cc:* Kenan Kılıçtepe <kkilict...@gmail.com>; user <user@flink.apache.org>
> *Subject:* Re: Uneven TM Distribution of Flink on YARN
>
>
>
> Thanks for your reply.
>
>
>
> The interesting fact is that we also managed spark on yarn. However. Only
> the flink cluster is having the issue. I am wondering whether there is a
> difference in the implementation on flink side.
>
>
>
> Best
>
> Lu
>
>
>
> On Mon, Aug 28, 2023 at 8:38 PM Chen Zhanghao <zhanghao.c...@outlook.com>
> wrote:
>
> Hi Lu Niu,
>
>
>
> TM distribution on YARN nodes is managed by YARN RM and is out of the
> scope of Flink. On the other hand, cluster.evenly-spread-out-slots forces
> even distribution of tasks among Flink TMs, and has nothing to do with your
> concerns. Also, the config currently only supports Standalone mode Flink
> clusters, and does not take effect on a Flink cluster on YARN.
>
>
>
> Best,
>
> Zhanghao Chen
> ------------------------------
>
> *发件人**:* Lu Niu <qqib...@gmail.com>
> *发送时间**:* 2023年8月29日 4:30
> *收件人**:* Kenan Kılıçtepe <kkilict...@gmail.com>
> *抄送**:* user <user@flink.apache.org>
> *主题**:* Re: Uneven TM Distribution of Flink on YARN
>
>
>
> Thanks for the reply. We've already set cluster.evenly-spread-out-slots =
> true
>
>
>
> Best
>
> Lu
>
>
>
> On Mon, Aug 28, 2023 at 1:23 PM Kenan Kılıçtepe <kkilict...@gmail.com>
> wrote:
>
> Have you checked config param cluster.evenly-spread-out-slots ?
>
>
>
>
>
> On Mon, Aug 28, 2023 at 10:31 PM Lu Niu <qqib...@gmail.com> wrote:
>
> Hi, Flink users
>
>
>
> We have recently observed that the allocation of Flink TaskManagers in our
> YARN cluster is not evenly distributed. We would like to hear your thoughts
> on this matter.
>
> 1. Our setup includes Flink version 1.15.1 and Hadoop 2.10.0.
> 2. The uneven distribution is that out of a 370-node YARN cluster, there
> are 16 nodes with either 0 or 1 vCore available, while 110 nodes have more
> than 10 vCores available.
>
>
>
> Is such behavior expected? If not, is there a fix provided in Flink?
> Thanks!
>
>
>
> Best
>
> Lu
>
>

Re: Uneven TM Distribution of Flink on YARN

Reply via email to