Thanks, I'll check it out! Best Lu
On Wed, Sep 6, 2023 at 10:09 AM Biao Geng <biaoge...@gmail.com> wrote: > Hi, > > > > If your YARN cluster uses fair scheduler, maybe you can check if the > yarn.scheduler.fair.assignmultiple > <https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/FairScheduler.html> > config is set. If that’s the case, then adjusting > yarn.scheduler.fair.dynamic.max.assign and yarn.scheduler.fair.max.assign > could be helpful. Also, AFAIK, flink does not exert extra control of > distribution of yarn apps on different nodes. The key diff between flink > and spark is that most flink jobs are unbounded while spark jobs are > bounded. It is possible that under same YARN scheduling strategy, the final > distribution of apps after some time is different. > > > > Best, > > Biao Geng > > > > *From: *Lu Niu <qqib...@gmail.com> > *Date: *Thursday, September 7, 2023 at 12:17 AM > *To: *Geng Biao <biaoge...@gmail.com> > *Cc: *Chen Zhanghao <zhanghao.c...@outlook.com>, Weihua Hu < > huweihua....@gmail.com>, Kenan Kılıçtepe <kkilict...@gmail.com>, user < > user@flink.apache.org> > *Subject: *Re: Uneven TM Distribution of Flink on YARN > > Hi, Thanks for all your help. Are there any other insights? > > > > > > Best > > Lu > > > > On Wed, Aug 30, 2023 at 11:29 AM Lu Niu <qqib...@gmail.com> wrote: > > No. we don't use yarn.taskmanager.node-label > > > > Best > > Lu > > > > On Tue, Aug 29, 2023 at 12:17 AM Geng Biao <biaoge...@gmail.com> wrote: > > Maybe you can check if you have set yarn.taskmanager.node-label for some > flink jobs? > > > > Best, > > Biao Geng > > > > 发送自 Outlook for iOS <https://aka.ms/o0ukef> > ------------------------------ > > *发件人**:* Chen Zhanghao <zhanghao.c...@outlook.com> > *发送时间**:* Tuesday, August 29, 2023 12:14:53 PM > *收件人**:* Lu Niu <qqib...@gmail.com>; Weihua Hu <huweihua....@gmail.com> > *抄送**:* Kenan Kılıçtepe <kkilict...@gmail.com>; user < > user@flink.apache.org> > *主题**:* Re: Uneven TM Distribution of Flink on YARN > > > > CCing @Weihua Hu <huweihua....@gmail.com> , who is an expert on this. Do > you have any ideas on the phenomenon here? > > > > Best, > > Zhanghao Chen > ------------------------------ > > *From:* Lu Niu <qqib...@gmail.com> > *Sent:* Tuesday, August 29, 2023 12:11:35 PM > *To:* Chen Zhanghao <zhanghao.c...@outlook.com> > *Cc:* Kenan Kılıçtepe <kkilict...@gmail.com>; user <user@flink.apache.org> > *Subject:* Re: Uneven TM Distribution of Flink on YARN > > > > Thanks for your reply. > > > > The interesting fact is that we also managed spark on yarn. However. Only > the flink cluster is having the issue. I am wondering whether there is a > difference in the implementation on flink side. > > > > Best > > Lu > > > > On Mon, Aug 28, 2023 at 8:38 PM Chen Zhanghao <zhanghao.c...@outlook.com> > wrote: > > Hi Lu Niu, > > > > TM distribution on YARN nodes is managed by YARN RM and is out of the > scope of Flink. On the other hand, cluster.evenly-spread-out-slots forces > even distribution of tasks among Flink TMs, and has nothing to do with your > concerns. Also, the config currently only supports Standalone mode Flink > clusters, and does not take effect on a Flink cluster on YARN. > > > > Best, > > Zhanghao Chen > ------------------------------ > > *发件人**:* Lu Niu <qqib...@gmail.com> > *发送时间**:* 2023年8月29日 4:30 > *收件人**:* Kenan Kılıçtepe <kkilict...@gmail.com> > *抄送**:* user <user@flink.apache.org> > *主题**:* Re: Uneven TM Distribution of Flink on YARN > > > > Thanks for the reply. We've already set cluster.evenly-spread-out-slots = > true > > > > Best > > Lu > > > > On Mon, Aug 28, 2023 at 1:23 PM Kenan Kılıçtepe <kkilict...@gmail.com> > wrote: > > Have you checked config param cluster.evenly-spread-out-slots ? > > > > > > On Mon, Aug 28, 2023 at 10:31 PM Lu Niu <qqib...@gmail.com> wrote: > > Hi, Flink users > > > > We have recently observed that the allocation of Flink TaskManagers in our > YARN cluster is not evenly distributed. We would like to hear your thoughts > on this matter. > > 1. Our setup includes Flink version 1.15.1 and Hadoop 2.10.0. > 2. The uneven distribution is that out of a 370-node YARN cluster, there > are 16 nodes with either 0 or 1 vCore available, while 110 nodes have more > than 10 vCores available. > > > > Is such behavior expected? If not, is there a fix provided in Flink? > Thanks! > > > > Best > > Lu > >