@Kurt

You raise some good points. These are tricky issues indeed. Here are some
thoughts:

(1)
I think the resources required for a function can only be decided by the
user (at least in a first version).

If I recall correctly think Blink used annotations in Yarn to the user code
to define how many resources a function should require.
For all cases where no such annotations are set, I think we should
interpret that as "no special requirements" - and request slots of a
default size for that.

  - For standalone, the size of slots is simply determined by the size of
the TaskManager process, divided by the number of slots.
  - For Yarn, I think we need to ask for a default container size, similar
as we do in the current version (through -ym and other flags)

(2)
Slot sharing on the level of SlotSharingGroup and CoLocationConstraint is
something that I would like to keep out of the ResourceManager/SlotPool/etc
These concepts may actually go away in the future (I would definitely like
to remove the CoLocationConstraint once we have cleaned up a few things in
the iterations code).

The ResourceManager would think about combining slots into containers (i.e.
allocate multi-slot containers). It could allocate a 2 vcore container with
10 slots of 0.2 vcores.

The best way to think about a slot would in that sense be the unit that is
independently allocated and released by the scheduler.

Greetings,
Stephan


On Mon, Aug 1, 2016 at 3:44 AM, Kurt Young <ykt...@gmail.com> wrote:

> Thanks for the great proposal.
>
> There are still 2 issues i concerned with which i want to discuss with.
>
> #1 Who should decide the resources one operator uses, user or framework?
> Like how much cpu or memory will cost by my "map" operator, does it seem a
> little bit too low level for the users, should we expose some APIs for
> these?
>
> #2 Who decides to combine the slots into a real container in Yarn and Mesos
> mode? Currently, flink has an optimize for resource utilization which
> called SlotSharingGroup. This took effects before flink allocate resources,
> we combine as many operators as we could into one single *SharedSlot*
> (which i think it's still a Slot). It seems all the combination or
> optimization are done before we allocate resources, so should we
> distinguish the differences between slots and containers(if we want
> introduces this concept, but i think it's needed by standalone mode). If
> the answer is yes, it will lead us to the situation that both JobManager
> and ResourceManager will know how to utilize resources. For logic like
> SlotSharingGroup, it's more appropriate to let Scheduler handle because
> it's has a lot informations about JobGraph and some constraint on it. But
> for some other logics which are more pure resources aware or Cluster
> specified, we may consider to let ResourceManager handle these. E.g. there
> are some limitation about Yarn's allocation, we can only allocate
> containers with "integer" vcores, so it's not possible for us to have some
> 0.1 or 0.2 vcore for now. We have bypassed this by combining some operators
> into one slot or it will cause waste of resources. But, i think it's better
> if we can make only one role aware all the resources utilizations.
>
> Thanks
> Kurt
>
> On Thu, Jul 28, 2016 at 5:22 PM, Maximilian Michels <m...@apache.org>
> wrote:
>
> > Hi Stephan,
> >
> > Thanks for the nice wrap-up of ideas and discussions we had over the
> > last months (not all on the mailing list though because we were just
> > getting started with the FLIP process). The document is very
> > comprehensive and explains the changes in great details, even up to
> > the message passing level.
> >
> > What I really like about the FLIP is that we delegate multi-tenancy
> > away from the JobManager to the resource management framework and the
> > dispatchers. This will help to make the JobManager component cleaner
> > and simpler. The prospect of having the user jars directly in the
> > system classpath of the workers, instead of dealing with custom class
> > loaders, is very nice.
> >
> > The model we have for acquiring and releasing resources wouldn't work
> > particularly well with all the new deployment options, so +1 on a new
> > task slot request/offer system and +1 for making the ResourceManager
> > responsible for TaskManager registration and slot management. This is
> > well aligned with the initial idea of the ResourceManager component.
> >
> > We definitely need good testing for these changes since the
> > possibility of bugs increases with the additional number of messages
> > introduced.
> >
> > The only thing that bugs me is whether we make the Standalone mode a
> > bit less nice to use. The initial bootstrapping of the nodes via the
> > local dispatchers and the subsequent registration of TaskManagers and
> > allocation of slots could cause some delay. It's not a major concern
> > though because it will take little time compared to the actual job run
> > time (unless you run a tiny WordCount).
> >
> > Cheers,
> > Max
> >
> >
> >
> >
> > On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org> wrote:
> > > Hi all!
> > >
> > > Here comes a pretty big FLIP: "Improvements to the Flink Deployment and
> > > Process Model", to better support Yarn, Mesos, Kubernetes, and whatever
> > > else Google, Elon Musk, and all the other folks will think up next.
> > >
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pa
> geId=65147077
> > >
> > > It is a pretty big FLIP where I took input and thoughts from many
> people,
> > > like Till, Max, Xiaowei (and his colleagues), Eron, and others.
> > >
> > > The core ideas revolve around
> > >   - making the JobManager in its core a per-job component (handle multi
> > > tenancey outside the JobManager)
> > >   - making resource acquisition and release more dynamic
> > >   - tying deployments more naturally to jobs where desirable
> > >
> > >
> > > Let's get the discussion started...
> > >
> > > Greetings,
> > > Stephan
> >
>

Reply via email to