Hi Lu, Yang is right. enabling cgroup isolation is probably the one you are looking for to control how Flink utilize the CPU resources. One more idea is to enable DominantResourceCalculator[1] which I think you've probably done so already.
Found an interesting read[2] about this if you would like to dig deeper. Thanks, Rong [1] https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/apidocs/org/apache/hadoop/yarn/util/resource/DominantResourceCalculator.html [2] https://developer.ibm.com/hadoop/2017/06/30/deep-dive-yarn-cgroups/ -- Rong On Fri, Nov 8, 2019 at 3:51 AM Yang Wang <danrtsey...@gmail.com> wrote: > Hi Lu Niu, > > Yes, you could use `yarn.containers.vcores` to set the vcores of > taskmanager. However, it could not > guarantee that the application do not affect each other. By default, the > yarn cluster are using cgroup > share. That means a taskmanager could use more cpu than it allocated. When > the machine is heavy, > linux kernel will use cpu share as weight to control different processes. > > If you want to limit the taskmanager could only use as it allocated, the > `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage=true` > is the only way. Yarn > nodemanager will set cpu quota for each taskmanager. > > > > > Best, > Yang > > Lu Niu <qqib...@gmail.com> 于2019年11月7日周四 上午1:15写道: > >> Hi, >> >> Thanks for replying! Basically I want to limit cpu usage so that >> different application don't affect each other. What's current best >> practice? Looks >> `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage=true` >> is one way. How to set how many cpu resources to use? is it >> "yarn.containers.vcores" ? >> >> it should be -ys not -yn in original post, sorry for the typo. >> >> Best >> Lu >> >> On Wed, Nov 6, 2019 at 1:41 AM Yang Wang <danrtsey...@gmail.com> wrote: >> >>> If you want to limit the TaskManager container cpu usage, it is based on >>> your yarn cluster configuration. >>> By default, yarn only uses cpu share. You need to set >>> `yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage=true` >>> in yarn-site.xml of all yarn node managers. >>> >>> >>> Best, >>> Yang >>> >>> Victor Wong <jiasheng.w...@outlook.com> 于2019年11月6日周三 下午5:02写道: >>> >>>> Hi Lu, >>>> >>>> >>>> >>>> You can check out which operator thread causes the high CPU usage, and >>>> set a unique slot sharing group name [1] to it to prevent too many operator >>>> threads running in the same TM. >>>> >>>> Hope this will be helpful😊 >>>> >>>> >>>> >>>> [1]. >>>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#task-chaining-and-resource-groups >>>> >>>> >>>> >>>> Best, >>>> >>>> Victor >>>> >>>> >>>> >>>> *From: *Vino Yang <yanghua1...@gmail.com> >>>> *Date: *Wednesday, 6 November 2019 at 4:26 PM >>>> *To: *Lu Niu <qqib...@gmail.com> >>>> *Cc: *user <user@flink.apache.org> >>>> *Subject: *Re: Limit max cpu usage per TaskManager >>>> >>>> >>>> >>>> Hi Lu, >>>> >>>> >>>> >>>> When using Flink on YARN, it will rely on YARN's resource management >>>> capabilities, and Flink cannot currently limit CPU usage. >>>> >>>> Also, what version of Flink do you use? As far as I know, since Flink >>>> 1.8, the -yn parameter will not work. >>>> >>>> >>>> >>>> Best, >>>> >>>> Vino >>>> >>>> >>>> >>>> Lu Niu <qqib...@gmail.com> 于2019年11月6日周三 下午1:29写道: >>>> >>>> Hi, >>>> >>>> >>>> >>>> When run flink application in yarn mode, is there a way to limit >>>> maximum cpu usage per TaskManager? >>>> >>>> >>>> >>>> I tried this application with just source and sink operator. >>>> parallelism of source is 60 and parallelism of sink is 1. When running in >>>> default config, there are 60 TaskManager assigned. I notice one TaskManager >>>> process cpu usage could be 200% white the rest below 50%. >>>> >>>> >>>> >>>> When I set -yn = 2 (default is 1), I notice # of TaskManger dropped >>>> down to 30. and one TaskManger process cpu usage could be 600% while the >>>> rest below 50%. >>>> >>>> >>>> >>>> Tried to set yarn.containers.vcores = 2, all tasks are in start state >>>> forever, application is not able to turn to running state. >>>> >>>> >>>> >>>> Best >>>> >>>> Lu >>>> >>>>