@tao I think we cannot limit the cpu usage of a slot, nor isolate the usages between slots. We do have cpu limits for the task executor in some scenarios, such as on yarn with strict cgroup mode.
The purpose of bookkeep and dynamic allocation of cpu cores is to prevent scheduling tasks with too many computation loads to the task executor, rather than limit the cpu usage of each slot. Thank you~ Xintong Song On Wed, Sep 18, 2019 at 12:18 AM tao xiao <xiaotao...@gmail.com> wrote: > Sorry if I ask a question that has been addressed before. please point me > to the reference. > > How do we limit the cpu usage to a slot? Does the thread that executes the > slot get paused when it uses CPU cycles more than it requests? > > On Tue, Sep 17, 2019 at 10:23 PM Xintong Song <tonysong...@gmail.com> > wrote: > > > Thanks for the feedback, Andrey. > > > > I'll start the vote. > > > > Thank you~ > > > > Xintong Song > > > > > > > > On Tue, Sep 17, 2019 at 10:09 PM Andrey Zagrebin <azagre...@apache.org> > > wrote: > > > > > Thanks for the update @Xintong. > > > I would be ok with starting the vote. > > > > > > Best, > > > Andrey > > > > > > On Tue, Sep 17, 2019 at 6:12 AM Xintong Song <tonysong...@gmail.com> > > > wrote: > > > > > > > The implementation plan [1] is updated, with the following changes: > > > > > > > > - Add default slot resource profile to > > > > ResourceManagerGateway#registerTaskExecutor rather than > > > #sendSlotReport. > > > > - Swap 'TaskExecutor derive and register with default slot > resource > > > > profile' and 'Extend TaskExecutor to support dynamic slot > > allocation' > > > > - Add step for updating RestAPI / Web UI > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > > > > > > On Tue, Sep 17, 2019 at 11:49 AM Xintong Song <tonysong...@gmail.com > > > > > > wrote: > > > > > > > > > @Till > > > > > Thanks for the reminding. I'll add a step for updating the web ui. > > I'll > > > > > try to involve Lining to help us with this step. > > > > > > > > > > @Andrey > > > > > I was thinking that after we define the RM-TM interfaces in step 2, > > it > > > > > would be good to concurrently work on both RM and TM side. But yes, > > if > > > we > > > > > finish Step 4 early, then it would make step 6 easier. We can start > > to > > > > have > > > > > some IT/E2E tests, with the default slot resource profiles being > > > > available. > > > > > > > > > > Thank you~ > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > On Mon, Sep 16, 2019 at 9:50 PM Andrey Zagrebin < > > and...@ververica.com> > > > > > wrote: > > > > > > > > > >> @Xintong > > > > >> > > > > >> Thanks for the feedback. > > > > >> > > > > >> Just to clarify step 6: > > > > >> If the first point is done before step 5 (e.g. as part of 4) then > it > > > is > > > > >> just keeping the info about the default slot in RM's data > structure > > > > >> associated the TM and no real change in the behaviour. > > > > >> When this info is available, I think it can be straightforwardly > > used > > > > >> during step 5 where we get either concrete slot requirement > > > > >> or the unknown one (step 6, point 2) which simply grabs some of > the > > > > >> concrete default ones (btw not clear which one, seems just some > > > random?) > > > > >> > > > > >> For steps 5,7, true, it is not quite clear whether we can avoid > some > > > > >> split, > > > > >> e.g. after step 5 before doing step 7. > > > > >> I agree that we should introduce the feature flag if we clearly > see > > > that > > > > >> it > > > > >> would be a bigger effort without the flag. > > > > >> > > > > >> Best, > > > > >> Andrey > > > > >> > > > > >> On Mon, Sep 16, 2019 at 3:21 PM Till Rohrmann < > trohrm...@apache.org > > > > > > > >> wrote: > > > > >> > > > > >> > One thing which was briefly mentioned in the Flip but not in the > > > > >> > implementation plan is the update of the web UI. I think it is > > worth > > > > >> > putting an extra item for updating the web UI to properly > display > > > the > > > > >> > resources a TM has still to offer with dynamic slot allocation. > I > > > > guess > > > > >> we > > > > >> > need to pull in some JavaScript help in order to implement this > > > step. > > > > >> > > > > > >> > Cheers, > > > > >> > Till > > > > >> > > > > > >> > On Mon, Sep 16, 2019 at 2:15 PM Xintong Song < > > tonysong...@gmail.com > > > > > > > > >> > wrote: > > > > >> > > > > > >> > > Thanks for the comments, Andrey. > > > > >> > > > > > > >> > > - I agree that instead of > ResourceManagerGateway#sendSlotReport, > > > we > > > > >> > should > > > > >> > > add the default slot resource profile to > > > > >> > > ResourceManagerGateway#registerTaskExecutor. > > > > >> > > > > > > >> > > - If I understand correctly, the reason you suggest do default > > > slot > > > > >> > > resource profile first and then do step 3 in a way that > support > > > both > > > > >> > > TaskExecutorGateway#requestSlot and > > > > >> TaskExecutorGateway#requestResource, > > > > >> > is > > > > >> > > to try to avoid splitting code paths with the feature option? > I > > > > think > > > > >> we > > > > >> > > can do that, but I also want to bring it up that this can only > > > > reduce > > > > >> the > > > > >> > > code split by the feature option (which is good) but not > > eliminate > > > > >> it. We > > > > >> > > still need the feature option for the fundamental differences, > > > e.g. > > > > >> > > creating new SlotIDs on allocation vs. allocate to free slots > > with > > > > >> > existing > > > > >> > > SlotIDs. > > > > >> > > > > > > >> > > - I don't really think we can do step 5, 6 and 7 > independently. > > > > >> Basically > > > > >> > > they are all making changes to the same component. We probably > > can > > > > do > > > > >> > step > > > > >> > > 6 and 7 independently, but I think they both depends on step > 5. > > > > >> > > > > > > >> > > In general, I would say it's good to have as less as possible > > > codes > > > > >> split > > > > >> > > by the feature option, which makes the later clean-up easier. > > But > > > if > > > > >> it > > > > >> > > cannot be easily done, I would rather not to put too much > > efforts > > > on > > > > >> > having > > > > >> > > a good abstraction and deduplication between the new code path > > and > > > > the > > > > >> > > original one that we are removing soon. > > > > >> > > > > > > >> > > What do you think? > > > > >> > > > > > > >> > > Thank you~ > > > > >> > > > > > > >> > > Xintong Song > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > On Mon, Sep 16, 2019 at 5:59 PM Andrey Zagrebin < > > > > and...@ververica.com > > > > >> > > > > > >> > > wrote: > > > > >> > > > > > > >> > > > Hi Xintong, > > > > >> > > > > > > > >> > > > Thanks for sharing the implementation steps. I also think > they > > > > makes > > > > >> > > sense > > > > >> > > > with the feature option. > > > > >> > > > > > > > >> > > > I was wondering if we could order the steps in a way that > each > > > > >> change > > > > >> > > does > > > > >> > > > not affect other components too much, always having a > working > > > > system > > > > >> > > > then maybe the feature option does not always need to split > > the > > > > >> code. > > > > >> > > Here > > > > >> > > > are some thoughts. > > > > >> > > > > > > > >> > > > - We could do default slot profile firstly and include it > into > > > the > > > > >> TM > > > > >> > > > registration. I would suggest to add > > > > >> > > > to ResourceManagerGateway#registerTaskExecutor, not > > > > sendSlotReport. > > > > >> > > > This way RM knows about it but does not use at this point. > > > > (parts > > > > >> of > > > > >> > > step > > > > >> > > > 4,6) > > > > >> > > > > > > > >> > > > - We could try to do step 3 firstly in a way that it also > > > supports > > > > >> the > > > > >> > > > current way of allocation in TaskExecutorGateway#requestSlot > > > with > > > > >> the > > > > >> > > > default slot profile > > > > >> > > > and sends reports both with available resources and with > > free > > > > >> default > > > > >> > > > slots which correspond to the available resources. We can > just > > > > >> remove > > > > >> > > free > > > > >> > > > default slots later. > > > > >> > > > The new way of TaskExecutorGateway#requestResource could > be > > > also > > > > >> > > > implemented here but not used yet. > > > > >> > > > > > > > >> > > > - Then step 5 can use the new > > > TaskExecutorGateway#requestResource > > > > >> and > > > > >> > the > > > > >> > > > default slot profile > > > > >> > > > > > > > >> > > > - Not sure, step 5 and 7 can be implemented independently > > > without > > > > >> > > > regression of what we have. Maybe if we do step 7 firstly it > > > will > > > > >> have > > > > >> > > only > > > > >> > > > default slots firstly and it will simplify step 5 later. > > > > >> > > > > > > > >> > > > Best, > > > > >> > > > Andrey > > > > >> > > > > > > > >> > > > On Mon, Sep 16, 2019 at 5:53 AM Xintong Song < > > > > tonysong...@gmail.com > > > > >> > > > > > >> > > > wrote: > > > > >> > > > > > > > >> > > > > Thanks for the comments, Till and Wenlong. > > > > >> > > > > > > > > >> > > > > @Wenlong > > > > >> > > > > Regarding slot sharing, the general idea is to request a > > slot > > > > with > > > > >> > > > > resources for tasks of the entire slot sharing group. > > Details > > > > can > > > > >> be > > > > >> > > > found > > > > >> > > > > in FLIP-53 [1], regarding how to decide the slot sharing > > > groups > > > > >> and > > > > >> > how > > > > >> > > > to > > > > >> > > > > manage task resources within the shared slots. > > > > >> > > > > > > > > >> > > > > Thank you~ > > > > >> > > > > > > > > >> > > > > Xintong Song > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > On Mon, Sep 16, 2019 at 10:42 AM wenlong.lwl < > > > > >> > wenlong88....@gmail.com> > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Hi, Xintong, thanks for the great proposal. big +1 for > the > > > > >> feature! > > > > >> > > It > > > > >> > > > is > > > > >> > > > > > something like mapreduce-1.0 to mapreduce-2.0. > > > > >> > > > > > > > > > >> > > > > > I like the design on the whole. One point may need to be > > > > >> included > > > > >> > in > > > > >> > > > the > > > > >> > > > > > proposal:How we deal with slot share group and dynamic > > slot > > > > >> > > allocation? > > > > >> > > > > It > > > > >> > > > > > can be quite different with dynamic slot allocation. > > > > >> > > > > > > > > > >> > > > > > On Fri, 13 Sep 2019 at 16:42, Till Rohrmann < > > > > >> trohrm...@apache.org> > > > > >> > > > > wrote: > > > > >> > > > > > > > > > >> > > > > > > Thanks for the update Xintong. From a high level > > > perspective > > > > >> the > > > > >> > > > > > > implementation plan looks good to me. > > > > >> > > > > > > > > > > >> > > > > > > Cheers, > > > > >> > > > > > > Till > > > > >> > > > > > > > > > > >> > > > > > > On Thu, Sep 12, 2019 at 11:04 AM Xintong Song < > > > > >> > > tonysong...@gmail.com > > > > >> > > > > > > > > >> > > > > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > > > Added implementation steps for this FLIP on the wiki > > > page > > > > >> [1]. > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > Thank you~ > > > > >> > > > > > > > > > > > >> > > > > > > > Xintong Song > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > [1] > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > On Tue, Aug 20, 2019 at 3:43 PM Xintong Song < > > > > >> > > > tonysong...@gmail.com> > > > > >> > > > > > > > wrote: > > > > >> > > > > > > > > > > > >> > > > > > > > > @Zili > > > > >> > > > > > > > > > > > > >> > > > > > > > > As far as I know, Timo is drafting a FLIP that has > > > taken > > > > >> the > > > > >> > > > number > > > > >> > > > > > 55. > > > > >> > > > > > > > > There is a round-up number maintained on the FLIP > > wiki > > > > >> page > > > > >> > [1] > > > > >> > > > > shows > > > > >> > > > > > > > > which number should be used for the new FLIP, > which > > > > >> should be > > > > >> > > > > > increased > > > > >> > > > > > > > by > > > > >> > > > > > > > > whoever takes the number for a new FLIP. > > > > >> > > > > > > > > > > > > >> > > > > > > > > Thank you~ > > > > >> > > > > > > > > > > > > >> > > > > > > > > Xintong Song > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > [1] > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > >> > > > > > > > > > > > > >> > > > > > > > > On Tue, Aug 20, 2019 at 3:28 AM Zili Chen < > > > > >> > > wander4...@gmail.com> > > > > >> > > > > > > wrote: > > > > >> > > > > > > > > > > > > >> > > > > > > > >> We suddenly skipped FLIP-55 lol. > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> Xintong Song <tonysong...@gmail.com> > 于2019年8月19日周一 > > > > >> > 下午10:23写道: > > > > >> > > > > > > > >> > > > > >> > > > > > > > >> > Hi everyone, > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > We would like to start a discussion thread on > > > > "FLIP-56: > > > > >> > > > Dynamic > > > > >> > > > > > Slot > > > > >> > > > > > > > >> > Allocation" [1]. This is originally part of the > > > > >> discussion > > > > >> > > > > thread > > > > >> > > > > > > for > > > > >> > > > > > > > >> > "FLIP-53: Fine Grained Resource Management" > [2]. > > As > > > > >> Till > > > > >> > > > > > suggested, > > > > >> > > > > > > we > > > > >> > > > > > > > >> > would like split the original discussion into > two > > > > >> topics, > > > > >> > > and > > > > >> > > > > > start > > > > >> > > > > > > a > > > > >> > > > > > > > >> > separate new discussion thread as well as FLIP > > > > process > > > > >> for > > > > >> > > > this > > > > >> > > > > > one. > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > Thank you~ > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > Xintong Song > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > [1] > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > [2] > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-53-Fine-Grained-Resource-Management-td31831.html > > > > >> > > > > > > > >> > > > > > >> > > > > > > > >> > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > -- > Regards, > Tao >