CCing @Weihua Hu<mailto:huweihua....@gmail.com> , who is an expert on this. Do you have any ideas on the phenomenon here?
Best, Zhanghao Chen ________________________________ From: Lu Niu <qqib...@gmail.com> Sent: Tuesday, August 29, 2023 12:11:35 PM To: Chen Zhanghao <zhanghao.c...@outlook.com> Cc: Kenan Kılıçtepe <kkilict...@gmail.com>; user <user@flink.apache.org> Subject: Re: Uneven TM Distribution of Flink on YARN Thanks for your reply. The interesting fact is that we also managed spark on yarn. However. Only the flink cluster is having the issue. I am wondering whether there is a difference in the implementation on flink side. Best Lu On Mon, Aug 28, 2023 at 8:38 PM Chen Zhanghao <zhanghao.c...@outlook.com<mailto:zhanghao.c...@outlook.com>> wrote: Hi Lu Niu, TM distribution on YARN nodes is managed by YARN RM and is out of the scope of Flink. On the other hand, cluster.evenly-spread-out-slots forces even distribution of tasks among Flink TMs, and has nothing to do with your concerns. Also, the config currently only supports Standalone mode Flink clusters, and does not take effect on a Flink cluster on YARN. Best, Zhanghao Chen ________________________________ 发件人: Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>> 发送时间: 2023年8月29日 4:30 收件人: Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>> 抄送: user <user@flink.apache.org<mailto:user@flink.apache.org>> 主题: Re: Uneven TM Distribution of Flink on YARN Thanks for the reply. We've already set cluster.evenly-spread-out-slots = true Best Lu On Mon, Aug 28, 2023 at 1:23 PM Kenan Kılıçtepe <kkilict...@gmail.com<mailto:kkilict...@gmail.com>> wrote: Have you checked config param cluster.evenly-spread-out-slots ? On Mon, Aug 28, 2023 at 10:31 PM Lu Niu <qqib...@gmail.com<mailto:qqib...@gmail.com>> wrote: Hi, Flink users We have recently observed that the allocation of Flink TaskManagers in our YARN cluster is not evenly distributed. We would like to hear your thoughts on this matter. 1. Our setup includes Flink version 1.15.1 and Hadoop 2.10.0. 2. The uneven distribution is that out of a 370-node YARN cluster, there are 16 nodes with either 0 or 1 vCore available, while 110 nodes have more than 10 vCores available. Is such behavior expected? If not, is there a fix provided in Flink? Thanks! Best Lu