Hi Xintong, thanks for drafting this FLIP. I think your proposal helps to improve the execution of batch jobs more efficiently. Moreover, it enables the proper integration of the Blink planner which is very important as well.
Overall, the FLIP looks good to me. I was wondering whether it wouldn't make sense to actually split it up into two FLIPs: Operator resource management and dynamic slot allocation. I think these two FLIPs could be seen as orthogonal and it would decrease the scope of each individual FLIP. Some smaller comments: - I'm not sure whether we should pass in the default slot size via an environment variable. Without having unified the way how Flink components are configured [1], I think it would be better to pass it in as part of the configuration. - I would avoid returning a null value from TaskExecutorGateway#requestResource if it cannot be fulfilled. Either we should introduce an explicit return value saying this or throw an exception. Concerning Yangze's comments: I think you are right that it would be helpful to make the selection strategy pluggable. Also batching slot requests to the RM could be a good optimization. For the sake of keeping the scope of this FLIP smaller I would try to tackle these things after the initial version has been completed (without spoiling these optimization opportunities). In particular batching the slot requests depends on the current scheduler refactoring and could also be realized on the RM side only. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration Cheers, Till On Fri, Aug 16, 2019 at 11:11 AM Yangze Guo <karma...@gmail.com> wrote: > Hi, Xintong > > Thanks to propose this FLIP. The general design looks good to me, +1 > for this feature. > > Since slots in the same task executor could have different resource > profile, we will > meet resource fragment problem. Think about this case: > - request A want 1G memory while request B & C want 0.5G memory > - There are two task executors T1 & T2 with 1G and 0.5G free memory > respectively > If B come first and we cut a slot from T1 for B, A must wait for the > free resource from > other task. But A could have been scheduled immediately if we cut a > slot from T2 for B. > > The logic of findMatchingSlot now become finding a task executor which > has enough > resource and then cut a slot from it. Current method could be seen as > "First-fit strategy", > which works well in general but sometimes could not be the optimization > method. > > Actually, this problem could be abstracted as "Bin Packing Problem"[1]. > Here are > some common approximate algorithms: > - First fit > - Next fit > - Best fit > > But it become multi-dimensional bin packing problem if we take CPU > into account. It hard > to define which one is best fit now. Some research addressed this > problem, such like Tetris[2]. > > Here are some thinking about it: > 1. We could make the strategy of finding matching task executor > pluginable. Let user to config the > best strategy in their scenario. > 2. We could support batch request interface in RM, because we have > opportunities to optimize > if we have more information. If we know the A, B, C at the same time, > we could always make the best decision. > > [1] http://www.or.deis.unibo.it/kp/Chapter8.pdf > [2] https://www.cs.cmu.edu/~xia/resources/Documents/grandl_sigcomm14.pdf > > Best, > Yangze Guo > > On Thu, Aug 15, 2019 at 10:40 PM Xintong Song <tonysong...@gmail.com> > wrote: > > > > Hi everyone, > > > > We would like to start a discussion thread on "FLIP-53: Fine Grained > > Resource Management"[1], where we propose how to improve Flink resource > > management and scheduling. > > > > This FLIP mainly discusses the following issues. > > > > - How to support tasks with fine grained resource requirements. > > - How to unify resource management for jobs with / without fine > grained > > resource requirements. > > - How to unify resource management for streaming / batch jobs. > > > > Key changes proposed in the FLIP are as follows. > > > > - Unify memory management for operators with / without fine grained > > resource requirements by applying a fraction based quota mechanism. > > - Unify resource scheduling for streaming and batch jobs by setting > slot > > sharing groups for pipelined regions during compiling stage. > > - Dynamically allocate slots from task executors' available resources. > > > > Please find more details in the FLIP wiki document [1]. Looking forward > to > > your feedbacks. > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Resource+Management >