Thanks for the feedbacks, Yangze and Till. Yangze,
I agree with you that we should make scheduling strategy pluggable and optimize the strategy to reduce the memory fragmentation problem, and thanks for the inputs on the potential algorithmic solutions. However, I'm in favor of keep this FLIP focusing on the overall mechanism design rather than strategies. Solving the fragmentation issue should be considered as an optimization, and I agree with Till that we probably should tackle this afterwards. Till, - Regarding splitting the FLIP, I think it makes sense. The operator resource management and dynamic slot allocation do not have much dependency on each other. - Regarding the default slot size, I think this is similar to FLIP-49 [1] where we want all the deriving happens at one place. I think it would be nice to pass the default slot size into the task executor in the same way that we pass in the memory pool sizes in FLIP-49 [1]. - Regarding the return value of TaskExecutorGateway#requestResource, I think you're right. We should avoid using null as the return value. I think we probably should thrown an exception here. Thank you~ Xintong Song [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors On Fri, Aug 16, 2019 at 2:18 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi Xintong, > > thanks for drafting this FLIP. I think your proposal helps to improve the > execution of batch jobs more efficiently. Moreover, it enables the proper > integration of the Blink planner which is very important as well. > > Overall, the FLIP looks good to me. I was wondering whether it wouldn't > make sense to actually split it up into two FLIPs: Operator resource > management and dynamic slot allocation. I think these two FLIPs could be > seen as orthogonal and it would decrease the scope of each individual FLIP. > > Some smaller comments: > > - I'm not sure whether we should pass in the default slot size via an > environment variable. Without having unified the way how Flink components > are configured [1], I think it would be better to pass it in as part of the > configuration. > - I would avoid returning a null value from > TaskExecutorGateway#requestResource if it cannot be fulfilled. Either we > should introduce an explicit return value saying this or throw an > exception. > > Concerning Yangze's comments: I think you are right that it would be > helpful to make the selection strategy pluggable. Also batching slot > requests to the RM could be a good optimization. For the sake of keeping > the scope of this FLIP smaller I would try to tackle these things after the > initial version has been completed (without spoiling these optimization > opportunities). In particular batching the slot requests depends on the > current scheduler refactoring and could also be realized on the RM side > only. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration > > Cheers, > Till > > > > On Fri, Aug 16, 2019 at 11:11 AM Yangze Guo <karma...@gmail.com> wrote: > > > Hi, Xintong > > > > Thanks to propose this FLIP. The general design looks good to me, +1 > > for this feature. > > > > Since slots in the same task executor could have different resource > > profile, we will > > meet resource fragment problem. Think about this case: > > - request A want 1G memory while request B & C want 0.5G memory > > - There are two task executors T1 & T2 with 1G and 0.5G free memory > > respectively > > If B come first and we cut a slot from T1 for B, A must wait for the > > free resource from > > other task. But A could have been scheduled immediately if we cut a > > slot from T2 for B. > > > > The logic of findMatchingSlot now become finding a task executor which > > has enough > > resource and then cut a slot from it. Current method could be seen as > > "First-fit strategy", > > which works well in general but sometimes could not be the optimization > > method. > > > > Actually, this problem could be abstracted as "Bin Packing Problem"[1]. > > Here are > > some common approximate algorithms: > > - First fit > > - Next fit > > - Best fit > > > > But it become multi-dimensional bin packing problem if we take CPU > > into account. It hard > > to define which one is best fit now. Some research addressed this > > problem, such like Tetris[2]. > > > > Here are some thinking about it: > > 1. We could make the strategy of finding matching task executor > > pluginable. Let user to config the > > best strategy in their scenario. > > 2. We could support batch request interface in RM, because we have > > opportunities to optimize > > if we have more information. If we know the A, B, C at the same time, > > we could always make the best decision. > > > > [1] http://www.or.deis.unibo.it/kp/Chapter8.pdf > > [2] https://www.cs.cmu.edu/~xia/resources/Documents/grandl_sigcomm14.pdf > > > > Best, > > Yangze Guo > > > > On Thu, Aug 15, 2019 at 10:40 PM Xintong Song <tonysong...@gmail.com> > > wrote: > > > > > > Hi everyone, > > > > > > We would like to start a discussion thread on "FLIP-53: Fine Grained > > > Resource Management"[1], where we propose how to improve Flink resource > > > management and scheduling. > > > > > > This FLIP mainly discusses the following issues. > > > > > > - How to support tasks with fine grained resource requirements. > > > - How to unify resource management for jobs with / without fine > > grained > > > resource requirements. > > > - How to unify resource management for streaming / batch jobs. > > > > > > Key changes proposed in the FLIP are as follows. > > > > > > - Unify memory management for operators with / without fine grained > > > resource requirements by applying a fraction based quota mechanism. > > > - Unify resource scheduling for streaming and batch jobs by setting > > slot > > > sharing groups for pipelined regions during compiling stage. > > > - Dynamically allocate slots from task executors' available > resources. > > > > > > Please find more details in the FLIP wiki document [1]. Looking forward > > to > > > your feedbacks. > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Resource+Management > > >