@Kurt You raise some good points. These are tricky issues indeed. Here are some thoughts:
(1) I think the resources required for a function can only be decided by the user (at least in a first version). If I recall correctly think Blink used annotations in Yarn to the user code to define how many resources a function should require. For all cases where no such annotations are set, I think we should interpret that as "no special requirements" - and request slots of a default size for that. - For standalone, the size of slots is simply determined by the size of the TaskManager process, divided by the number of slots. - For Yarn, I think we need to ask for a default container size, similar as we do in the current version (through -ym and other flags) (2) Slot sharing on the level of SlotSharingGroup and CoLocationConstraint is something that I would like to keep out of the ResourceManager/SlotPool/etc These concepts may actually go away in the future (I would definitely like to remove the CoLocationConstraint once we have cleaned up a few things in the iterations code). The ResourceManager would think about combining slots into containers (i.e. allocate multi-slot containers). It could allocate a 2 vcore container with 10 slots of 0.2 vcores. The best way to think about a slot would in that sense be the unit that is independently allocated and released by the scheduler. Greetings, Stephan On Mon, Aug 1, 2016 at 3:44 AM, Kurt Young <ykt...@gmail.com> wrote: > Thanks for the great proposal. > > There are still 2 issues i concerned with which i want to discuss with. > > #1 Who should decide the resources one operator uses, user or framework? > Like how much cpu or memory will cost by my "map" operator, does it seem a > little bit too low level for the users, should we expose some APIs for > these? > > #2 Who decides to combine the slots into a real container in Yarn and Mesos > mode? Currently, flink has an optimize for resource utilization which > called SlotSharingGroup. This took effects before flink allocate resources, > we combine as many operators as we could into one single *SharedSlot* > (which i think it's still a Slot). It seems all the combination or > optimization are done before we allocate resources, so should we > distinguish the differences between slots and containers(if we want > introduces this concept, but i think it's needed by standalone mode). If > the answer is yes, it will lead us to the situation that both JobManager > and ResourceManager will know how to utilize resources. For logic like > SlotSharingGroup, it's more appropriate to let Scheduler handle because > it's has a lot informations about JobGraph and some constraint on it. But > for some other logics which are more pure resources aware or Cluster > specified, we may consider to let ResourceManager handle these. E.g. there > are some limitation about Yarn's allocation, we can only allocate > containers with "integer" vcores, so it's not possible for us to have some > 0.1 or 0.2 vcore for now. We have bypassed this by combining some operators > into one slot or it will cause waste of resources. But, i think it's better > if we can make only one role aware all the resources utilizations. > > Thanks > Kurt > > On Thu, Jul 28, 2016 at 5:22 PM, Maximilian Michels <m...@apache.org> > wrote: > > > Hi Stephan, > > > > Thanks for the nice wrap-up of ideas and discussions we had over the > > last months (not all on the mailing list though because we were just > > getting started with the FLIP process). The document is very > > comprehensive and explains the changes in great details, even up to > > the message passing level. > > > > What I really like about the FLIP is that we delegate multi-tenancy > > away from the JobManager to the resource management framework and the > > dispatchers. This will help to make the JobManager component cleaner > > and simpler. The prospect of having the user jars directly in the > > system classpath of the workers, instead of dealing with custom class > > loaders, is very nice. > > > > The model we have for acquiring and releasing resources wouldn't work > > particularly well with all the new deployment options, so +1 on a new > > task slot request/offer system and +1 for making the ResourceManager > > responsible for TaskManager registration and slot management. This is > > well aligned with the initial idea of the ResourceManager component. > > > > We definitely need good testing for these changes since the > > possibility of bugs increases with the additional number of messages > > introduced. > > > > The only thing that bugs me is whether we make the Standalone mode a > > bit less nice to use. The initial bootstrapping of the nodes via the > > local dispatchers and the subsequent registration of TaskManagers and > > allocation of slots could cause some delay. It's not a major concern > > though because it will take little time compared to the actual job run > > time (unless you run a tiny WordCount). > > > > Cheers, > > Max > > > > > > > > > > On Fri, Jul 22, 2016 at 9:26 PM, Stephan Ewen <se...@apache.org> wrote: > > > Hi all! > > > > > > Here comes a pretty big FLIP: "Improvements to the Flink Deployment and > > > Process Model", to better support Yarn, Mesos, Kubernetes, and whatever > > > else Google, Elon Musk, and all the other folks will think up next. > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pa > geId=65147077 > > > > > > It is a pretty big FLIP where I took input and thoughts from many > people, > > > like Till, Max, Xiaowei (and his colleagues), Eron, and others. > > > > > > The core ideas revolve around > > > - making the JobManager in its core a per-job component (handle multi > > > tenancey outside the JobManager) > > > - making resource acquisition and release more dynamic > > > - tying deployments more naturally to jobs where desirable > > > > > > > > > Let's get the discussion started... > > > > > > Greetings, > > > Stephan > > >