Hi Xintong,

Thanks for your detailed proposal. I think many users are suffering from
waste of resources. The resource spec of all task managers are same and we
have to increase all task managers to make the heavy one more stable. So we
will benefit from the fine grained resource management a lot. We could get
better resource utilization and stability.


Just to share some thoughts.



   1. How to calculate the resource specification of TaskManagers? Do they
   have them same resource spec calculated based on the configuration? I think
   we still have wasted resources in this situation. Or we could start
   TaskManagers with different spec.
   2. If a slot is released and returned to SlotPool, does it could be
   reused by other SlotRequest that the request resource is smaller than it?
      - If it is yes, what happens to the available resource in the
      TaskManager.
      - What is the SlotStatus of the cached slot in SlotPool? The
      AllocationId is null?
   3. In a session cluster, some jobs are configured with operator
   resources, meanwhile other jobs are using UNKNOWN. How to deal with this
   situation?



Best,
Yang

Xintong Song <tonysong...@gmail.com> 于2019年8月16日周五 下午8:57写道:

> Thanks for the feedbacks, Yangze and Till.
>
> Yangze,
>
> I agree with you that we should make scheduling strategy pluggable and
> optimize the strategy to reduce the memory fragmentation problem, and
> thanks for the inputs on the potential algorithmic solutions. However, I'm
> in favor of keep this FLIP focusing on the overall mechanism design rather
> than strategies. Solving the fragmentation issue should be considered as an
> optimization, and I agree with Till that we probably should tackle this
> afterwards.
>
> Till,
>
> - Regarding splitting the FLIP, I think it makes sense. The operator
> resource management and dynamic slot allocation do not have much dependency
> on each other.
>
> - Regarding the default slot size, I think this is similar to FLIP-49 [1]
> where we want all the deriving happens at one place. I think it would be
> nice to pass the default slot size into the task executor in the same way
> that we pass in the memory pool sizes in FLIP-49 [1].
>
> - Regarding the return value of TaskExecutorGateway#requestResource, I
> think you're right. We should avoid using null as the return value. I think
> we probably should thrown an exception here.
>
> Thank you~
>
> Xintong Song
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>
> On Fri, Aug 16, 2019 at 2:18 PM Till Rohrmann <trohrm...@apache.org>
> wrote:
>
> > Hi Xintong,
> >
> > thanks for drafting this FLIP. I think your proposal helps to improve the
> > execution of batch jobs more efficiently. Moreover, it enables the proper
> > integration of the Blink planner which is very important as well.
> >
> > Overall, the FLIP looks good to me. I was wondering whether it wouldn't
> > make sense to actually split it up into two FLIPs: Operator resource
> > management and dynamic slot allocation. I think these two FLIPs could be
> > seen as orthogonal and it would decrease the scope of each individual
> FLIP.
> >
> > Some smaller comments:
> >
> > - I'm not sure whether we should pass in the default slot size via an
> > environment variable. Without having unified the way how Flink components
> > are configured [1], I think it would be better to pass it in as part of
> the
> > configuration.
> > - I would avoid returning a null value from
> > TaskExecutorGateway#requestResource if it cannot be fulfilled. Either we
> > should introduce an explicit return value saying this or throw an
> > exception.
> >
> > Concerning Yangze's comments: I think you are right that it would be
> > helpful to make the selection strategy pluggable. Also batching slot
> > requests to the RM could be a good optimization. For the sake of keeping
> > the scope of this FLIP smaller I would try to tackle these things after
> the
> > initial version has been completed (without spoiling these optimization
> > opportunities). In particular batching the slot requests depends on the
> > current scheduler refactoring and could also be realized on the RM side
> > only.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
> >
> > Cheers,
> > Till
> >
> >
> >
> > On Fri, Aug 16, 2019 at 11:11 AM Yangze Guo <karma...@gmail.com> wrote:
> >
> > > Hi, Xintong
> > >
> > > Thanks to propose this FLIP. The general design looks good to me, +1
> > > for this feature.
> > >
> > > Since slots in the same task executor could have different resource
> > > profile, we will
> > > meet resource fragment problem. Think about this case:
> > >  - request A want 1G memory while request B & C want 0.5G memory
> > >  - There are two task executors T1 & T2 with 1G and 0.5G free memory
> > > respectively
> > > If B come first and we cut a slot from T1 for B, A must wait for the
> > > free resource from
> > > other task. But A could have been scheduled immediately if we cut a
> > > slot from T2 for B.
> > >
> > > The logic of findMatchingSlot now become finding a task executor which
> > > has enough
> > > resource and then cut a slot from it. Current method could be seen as
> > > "First-fit strategy",
> > > which works well in general but sometimes could not be the optimization
> > > method.
> > >
> > > Actually, this problem could be abstracted as "Bin Packing Problem"[1].
> > > Here are
> > > some common approximate algorithms:
> > > - First fit
> > > - Next fit
> > > - Best fit
> > >
> > > But it become multi-dimensional bin packing problem if we take CPU
> > > into account. It hard
> > > to define which one is best fit now. Some research addressed this
> > > problem, such like Tetris[2].
> > >
> > > Here are some thinking about it:
> > > 1. We could make the strategy of finding matching task executor
> > > pluginable. Let user to config the
> > > best strategy in their scenario.
> > > 2. We could support batch request interface in RM, because we have
> > > opportunities to optimize
> > > if we have more information. If we know the A, B, C at the same time,
> > > we could always make the best decision.
> > >
> > > [1] http://www.or.deis.unibo.it/kp/Chapter8.pdf
> > > [2]
> https://www.cs.cmu.edu/~xia/resources/Documents/grandl_sigcomm14.pdf
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Thu, Aug 15, 2019 at 10:40 PM Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > We would like to start a discussion thread on "FLIP-53: Fine Grained
> > > > Resource Management"[1], where we propose how to improve Flink
> resource
> > > > management and scheduling.
> > > >
> > > > This FLIP mainly discusses the following issues.
> > > >
> > > >    - How to support tasks with fine grained resource requirements.
> > > >    - How to unify resource management for jobs with / without fine
> > > grained
> > > >    resource requirements.
> > > >    - How to unify resource management for streaming / batch jobs.
> > > >
> > > > Key changes proposed in the FLIP are as follows.
> > > >
> > > >    - Unify memory management for operators with / without fine
> grained
> > > >    resource requirements by applying a fraction based quota
> mechanism.
> > > >    - Unify resource scheduling for streaming and batch jobs by
> setting
> > > slot
> > > >    sharing groups for pipelined regions during compiling stage.
> > > >    - Dynamically allocate slots from task executors' available
> > resources.
> > > >
> > > > Please find more details in the FLIP wiki document [1]. Looking
> forward
> > > to
> > > > your feedbacks.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Resource+Management
> > >
> >
>

Reply via email to