Hi everyone, I'm a bit late to the party. I think the current proposal looks good.
Concerning the ExternalResourceDriver interface defined in the FLIP [1], I would suggest to not include the decorator calls for Kubernetes and Yarn in the base interface. Instead I would suggest to segregate the deployment specific decorator calls into separate interfaces. That way an ExternalResourceDriver does not have to support all deployments from the very beginning. Moreover, some resources might not be supported by a specific deployment target and the natural way to express this would be to not implement the respective deployment specific interface. Moreover, having void addExternalResourceToRequest(AMRMClient.ContainerRequest containerRequest) in the ExternalResourceDriver interface would require Hadoop on Flink's classpath whenever the external resource driver is being used. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink Cheers, Till On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <se...@apache.org> wrote: > Nice, thanks a lot! > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <karma...@gmail.com> wrote: > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong. > > > > I've updated the FLIP accordingly. I do not add a > > ResourceInfoProvider. Instead, I introduce the ExternalResourceDriver, > > which takes the responsibility of all relevant operations on both RM > > and TM sides. > > After a rethink about decoupling the management of external resources > > from TaskExecutor, I think we could do the same thing on the > > ResourceManager side. We do not need to add a specific allocation > > logic to the ResourceManager each time we add a specific external > > resource. > > - For Yarn, we need the ExternalResourceDriver to edit the > > containerRequest. > > - For Kubenetes, ExternalResourceDriver could provide a decorator for > > the TM pod. > > > > In this way, just like MetricReporter, we allow users to define their > > custom ExternalResourceDriver. It is more extensible and fits the > > separation of concerns. For more details, please take a look at [1]. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > Best, > > Yangze Guo > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <se...@apache.org> wrote: > > > > > > This sounds good to go ahead from my side. > > > > > > I like the approach that Becket suggested - in that case the core > > > abstraction that everyone would need to understand would be "external > > > resource allocation" and the "ResourceInfoProvider", and the GPU > specific > > > code would be a specific implementation only known to that component > that > > > allocates the external resource. That fits the separation of concerns > > well. > > > > > > I also understand that it should not be over-engineered in the first > > > version, so some simplification makes sense, and then gradually expand > > from > > > there. > > > > > > So +1 to go ahead with what was suggested above (Xintong / Becket) from > > my > > > side. > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <tonysong...@gmail.com> > > wrote: > > > > > > > Thanks for the comments, Stephan & Becket. > > > > > > > > @Stephan > > > > > > > > I see your concern, and I completely agree with you that we should > > first > > > > think about the "library" / "plugin" / "extension" style if possible. > > > > > > > > If GPUs are sliced and assigned during scheduling, there may be > reason, > > > > > although it looks that it would belong to the slot then. Is that > > what we > > > > > are doing here? > > > > > > > > > > > > In the current proposal, we do not have the GPUs sliced and assigned > to > > > > slots, because it could be problematic without dynamic slot > allocation. > > > > E.g., the number of GPUs might not be evenly divisible by the number > of > > > > slots. > > > > > > > > I think it makes sense to eventually have the GPUs assigned to slots. > > Even > > > > then, we might still need a TM level GPUManager (or ResourceProvider > > like > > > > Becket suggested). For memory, in each slot we can simply request the > > > > amount of memory, leaving it to JVM / OS to decide which memory > > (address) > > > > should be assigned. For GPU, and potentially other resources like > > FPGA, we > > > > need to explicitly specify which GPU (index) should be used. > > Therefore, we > > > > need some component at the TM level to coordinate which slot uses > which > > > > GPU. > > > > > > > > IMO, unless we say Flink will not support slot-level GPU slicing at > > least > > > > in the foreseeable future, I don't see a good way to avoid touching > > the TM > > > > core. To that end, I think Becket's suggestion points to a good > > direction, > > > > that supports more features (GPU, FPGA, etc.) with less coupling to > > the TM > > > > core (only needs to understand the general interfaces). The detailed > > > > implementation for specific resource types can even be encapsulated > as > > a > > > > library. > > > > > > > > @Becket > > > > > > > > Thanks for sharing your thought on the final state. Despite the > > details how > > > > the interfaces should look like, I think this is a really good > > abstraction > > > > for supporting general resource types. > > > > > > > > I'd like to further clarify that, the following three things are all > > that > > > > the "Flink core" needs to understand. > > > > > > > > - The *amount* of resource, for scheduling. Actually, we already > > have > > > > the Resource class in ResourceProfile and ResourceSpec for > extended > > > > resource. It's just not really used. > > > > - The *info*, that Flink provides to the operators / user codes. > > > > - The *provider*, which generates the info based on the amount. > > > > > > > > The "core" does not need to understand the specific implementation > > details > > > > of the above three. They can even be implemented in a 3rd-party > > library. > > > > Similar to how we allow users to define their custom MetricReporter. > > > > > > > > Thank you~ > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <becket....@gmail.com> > > wrote: > > > > > > > > > Thanks for the comment, Stephan. > > > > > > > > > > - If everything becomes a "core feature", it will make the > project > > hard > > > > > > to develop in the future. Thinking "library" / "plugin" / > > "extension" > > > > > style > > > > > > where possible helps. > > > > > > > > > > > > > > > Completely agree. It is much more important to design a mechanism > > than > > > > > focusing on a specific case. Here is what I am thinking to fully > > support > > > > > custom resource management: > > > > > 1. On the JM / RM side, use ResourceProfile and ResourceSpec to > > define > > > > the > > > > > resource and the amount required. They will be used to find > suitable > > TMs > > > > > slots to run the tasks. At this point, the resources are only > > measured by > > > > > amount, i.e. they do not have individual ID. > > > > > > > > > > 2. On the TM side, have something like *"ResourceInfoProvider"* to > > > > identify > > > > > and provides the detail information of the individual resource, > e.g. > > GPU > > > > > ID.. It is important because the operator may have to explicitly > > interact > > > > > with the physical resource it uses. The ResourceInfoProvider might > > look > > > > > like something below. > > > > > interface ResourceInfoProvider<INFO> { > > > > > Map<AbstractID, INFO> retrieveResourceInfo(OperatorId opId, > > > > > ResourceProfile resourceProfile); > > > > > } > > > > > > > > > > - There could be several "*ResourceInfoProvider*" configured on the > > TM to > > > > > retrieve the information for different resources. > > > > > - The TM will be responsible to assign those individual resources > to > > each > > > > > operator according to their requested amount. > > > > > - The operators will be able to get the ResourceInfo from their > > > > > RuntimeContext. > > > > > > > > > > If we agree this is a reasonable final state. We can adapt the > > current > > > > FLIP > > > > > to it. In fact it does not sound a big change to me. All the > proposed > > > > > configuration can be as is, it is just that Flink itself won't care > > about > > > > > them, instead a GPUInfoProviver implementing the > ResourceInfoProvider > > > > will > > > > > use them. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <se...@apache.org> > > wrote: > > > > > > > > > > > Hi all! > > > > > > > > > > > > The main point I wanted to throw into the discussion is the > > following: > > > > > > - With more and more use cases, more and more tools go into > Flink > > > > > > - If everything becomes a "core feature", it will make the > > project > > > > hard > > > > > > to develop in the future. Thinking "library" / "plugin" / > > "extension" > > > > > style > > > > > > where possible helps. > > > > > > > > > > > > - A good thought experiment is always: How many future > developers > > > > have > > > > > to > > > > > > interact with this code (and possibly understand it partially), > > even if > > > > > the > > > > > > features they touch have nothing to do with GPU support. If many > > > > > > contributors to unrelated features will have to touch it and > > understand > > > > > it, > > > > > > then let's think if there is a different solution. Maybe there is > > not, > > > > > but > > > > > > then we should be sure why. > > > > > > > > > > > > - That led me to raising this issue: If the GPU manager > becomes a > > > > core > > > > > > service in the TaskManager, Environment, RuntimeContext, etc. > then > > > > > everyone > > > > > > developing TM and streaming tasks need to understand the GPU > > manager. > > > > > That > > > > > > seems oddly specific, is my impression. > > > > > > > > > > > > Access to configuration seems not the right reason to do that. We > > > > should > > > > > > expose the Flink configuration from the RuntimeContext anyways. > > > > > > > > > > > > If GPUs are sliced and assigned during scheduling, there may be > > reason, > > > > > > although it looks that it would belong to the slot then. Is that > > what > > > > we > > > > > > are doing here? > > > > > > > > > > > > Best, > > > > > > Stephan > > > > > > > > > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song < > > tonysong...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Thanks for the feedback, Becket. > > > > > > > > > > > > > > IMO, eventually an operator should only see info of GPUs that > are > > > > > > dedicated > > > > > > > for it, instead of all GPUs on the machine/container in the > > current > > > > > > design. > > > > > > > It does not make sense to let the user who writes a UDF to > worry > > > > about > > > > > > > coordination among multiple operators running on the same > > machine. > > > > And > > > > > if > > > > > > > we want to limit the GPU info an operator sees, we should not > > let the > > > > > > > operator to instantiate GPUManager, which means we have to > expose > > > > > > something > > > > > > > through runtime context, either GPU info or some kind of > limited > > > > access > > > > > > to > > > > > > > the GPUManager. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin < > becket....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > It probably make sense for us to first agree on the final > > state. > > > > More > > > > > > > > specifically, will the resource info be exposed through > runtime > > > > > context > > > > > > > > eventually? > > > > > > > > > > > > > > > > If that is the final state and we have a seamless migration > > story > > > > > from > > > > > > > this > > > > > > > > FLIP to that final state, Personally I think it is OK to > > expose the > > > > > GPU > > > > > > > > info in the runtime context. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song < > > > > tonysong...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > @Yangze, > > > > > > > > > I think what Stephan means (@Stephan, please correct me if > > I'm > > > > > wrong) > > > > > > > is > > > > > > > > > that, we might not need to hold and maintain the GPUManager > > as a > > > > > > > service > > > > > > > > in > > > > > > > > > TaskManagerServices or RuntimeContext. An alternative is to > > > > create > > > > > / > > > > > > > > > retrieve the GPUManager only in the operators that need it, > > e.g., > > > > > > with > > > > > > > a > > > > > > > > > static method `GPUManager.get()`. > > > > > > > > > > > > > > > > > > @Stephan, > > > > > > > > > I agree with you on excluding GPUManager from > > > > TaskManagerServices. > > > > > > > > > > > > > > > > > > - For the first step, where we provide unified TM-level > > GPU > > > > > > > > information > > > > > > > > > to all operators, it should be fine to have operators > > access / > > > > > > > > > lazy-initiate GPUManager by themselves. > > > > > > > > > - In future, we might have some more fine-grained GPU > > > > > management, > > > > > > > > where > > > > > > > > > we need to maintain GPUManager as a service and put GPU > > info > > > > in > > > > > > slot > > > > > > > > > profiles. But at least for now it's not necessary to > > introduce > > > > > > such > > > > > > > > > complexity. > > > > > > > > > > > > > > > > > > However, I have some concerns on excluding GPUManager from > > > > > > > RuntimeContext > > > > > > > > > and let operators access it directly. > > > > > > > > > > > > > > > > > > - Configurations needed for creating the GPUManager is > not > > > > > always > > > > > > > > > available for operators. > > > > > > > > > - If later we want to have fine-grained control over GPU > > > > (e.g., > > > > > > > > > operators in each slot can only see GPUs reserved for > that > > > > > slot), > > > > > > > the > > > > > > > > > approach cannot be easily extended. > > > > > > > > > > > > > > > > > > I would suggest to wrap the GPUManager behind > RuntimeContext > > and > > > > > only > > > > > > > > > expose the GPUInfo to users. For now, we can declare a > method > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default definition > > that > > > > > > calls > > > > > > > > > `GPUManager.get()` to get the lazily-created GPUManager. If > > later > > > > > we > > > > > > > want > > > > > > > > > to create / retrieve GPUManager in a different way, we can > > simply > > > > > > > change > > > > > > > > > how `getGPUInfo` is implemented, without needing to change > > any > > > > > public > > > > > > > > > interfaces. > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo < > > karma...@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > @Shephan > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to share the > > GPU > > > > > > Manager > > > > > > > > > > in such scenario. > > > > > > > > > > If that's what you worry about, I'm +1 for holding > > > > > > > > > > GPUManager(ExternalResourceManagers) in TaskExecutor > > instead of > > > > > > > > > > TaskManagerServices. > > > > > > > > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext, it just > > holds the > > > > > GPU > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's the only > > place we > > > > > > could > > > > > > > > > > pass GPU info to the RichFunction/UserDefinedFunction. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried < > > > > > > is...@paddlesoft.net > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 > se...@apache.org > > > > wrote > > > > > > > ---- > > > > > > > > > > > > > > > > > > > > > > > > Can we somehow keep this out of the TaskManager > > services > > > > > > > > > > > > I fear that we could not. IMO, the GPUManager(or > > > > > > > > > > > > ExternalServicesManagers in future) is conceptually > > one of > > > > > the > > > > > > > task > > > > > > > > > > > > manager services, just like MemoryManager before > 1.10. > > > > > > > > > > > > - It maintains/holds the GPU resource at TM level and > > all > > > > of > > > > > > the > > > > > > > > > > > > operators allocate the GPU resources from it. So, it > > should > > > > > be > > > > > > > > > > > > exclusive to a single TaskExecutor. > > > > > > > > > > > > - We could add a collection called > > ExternalResourceManagers > > > > > to > > > > > > > hold > > > > > > > > > > > > all managers of other external resources in the > future. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Can you help me understand why this needs the addition > in > > > > > > > > > > TaskMagerServices > > > > > > > > > > > or in the RuntimeContext? > > > > > > > > > > > Are you worried about the case when multiple Task > > Executors > > > > run > > > > > > in > > > > > > > > the > > > > > > > > > > same > > > > > > > > > > > JVM? That's not common, but wouldn't it actually be > good > > in > > > > > that > > > > > > > case > > > > > > > > > to > > > > > > > > > > > share the GPU Manager, given that the GPU is shared? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > > --------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > What parts need information about this? > > > > > > > > > > > > In this FLIP, operators need the information. Thus, > we > > > > expose > > > > > > GPU > > > > > > > > > > > > information to the RuntimeContext/FunctionContext. > The > > slot > > > > > > > profile > > > > > > > > > is > > > > > > > > > > > > not aware of GPU resources as GPU is TM level > resource > > now. > > > > > > > > > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self contained" thing > that > > > > simply > > > > > > > takes > > > > > > > > > the > > > > > > > > > > > > configuration, and then abstracts everything > > internally? > > > > > > > > > > > > Yes, we just pass the path/args of the discover > script > > and > > > > > how > > > > > > > many > > > > > > > > > > > > GPUs per TM to it. It takes the responsibility to get > > the > > > > GPU > > > > > > > > > > > > information and expose them to the > > > > > > RuntimeContext/FunctionContext > > > > > > > > of > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow operators > > to > > > > > > directly > > > > > > > > > > > > access GPUManager, it should get what they want from > > > > Context. > > > > > > We > > > > > > > > > could > > > > > > > > > > > > then decouple the interface/implementation of > > GPUManager > > > > and > > > > > > > Public > > > > > > > > > > > > API. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen < > > > > > se...@apache.org > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > It sounds fine to initially start with GPU specific > > > > support > > > > > > and > > > > > > > > > think > > > > > > > > > > > > about > > > > > > > > > > > > > generalizing this once we better understand the > > space. > > > > > > > > > > > > > > > > > > > > > > > > > > About the implementation suggested in FLIP-108: > > > > > > > > > > > > > - Can we somehow keep this out of the TaskManager > > > > services? > > > > > > > > > Anything > > > > > > > > > > we > > > > > > > > > > > > > have to pull through all layers of the TM makes the > > TM > > > > > > > components > > > > > > > > > yet > > > > > > > > > > > > more > > > > > > > > > > > > > complex and harder to maintain. > > > > > > > > > > > > > > > > > > > > > > > > > > - What parts need information about this? > > > > > > > > > > > > > -> do the slot profiles need information about the > > GPU? > > > > > > > > > > > > > -> Can the GPU Manager be a "self contained" thing > > that > > > > > > simply > > > > > > > > > takes > > > > > > > > > > > > > the configuration, and then abstracts everything > > > > > internally? > > > > > > > > > > Operators > > > > > > > > > > > > can > > > > > > > > > > > > > access it via "GPUManager.get()" or so? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo < > > > > > > karma...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Becket > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're right, > > I'll add > > > > > > them > > > > > > > to > > > > > > > > > the > > > > > > > > > > > > > > Public API section. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Stephan @Becket > > > > > > > > > > > > > > Regarding the general extended resource > mechanism, > > I > > > > > second > > > > > > > > > > Xintong's > > > > > > > > > > > > > > suggestion. > > > > > > > > > > > > > > - It's better to leverage ResourceProfile and > > > > > ResourceSpec > > > > > > > > after > > > > > > > > > we > > > > > > > > > > > > > > supporting fine-grained GPU scheduling. As a > first > > step > > > > > > > > > proposal, I > > > > > > > > > > > > > > prefer to not include it in the scope of this > FLIP. > > > > > > > > > > > > > > - Regarding the "Extended Resource Manager", if I > > > > > > understand > > > > > > > > > > > > > > correctly, it just a code refactoring atm, we > could > > > > > extract > > > > > > > the > > > > > > > > > > > > > > open/close/allocateExtendResources of GPUManager > to > > > > that > > > > > > > > > > interface. If > > > > > > > > > > > > > > that is the case, +1 to do it during > > implementation. > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Xingbo > > > > > > > > > > > > > > As Xintong said, we looked into how Spark > supports > > a > > > > > > general > > > > > > > > > > "Custom > > > > > > > > > > > > > > Resource Scheduling" before and decided to > > introduce a > > > > > > common > > > > > > > > > > resource > > > > > > > > > > > > > > configuration > > > > > > > > > > > > > > > > > > > > > > > > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script) > > > > > > > > > > > > > > to make it more extensible. I think the > "resource" > > is a > > > > > > > proper > > > > > > > > > > level > > > > > > > > > > > > > > to contain all the configs of extended resources. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang < > > > > > > > > hxbks...@gmail.com > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource management > > > > support > > > > > > will > > > > > > > > > > greatly > > > > > > > > > > > > > > > facilitate the development of AI-related > > applications > > > > > by > > > > > > > > > PyFlink > > > > > > > > > > > > users. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I have only one comment about this wiki: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Regarding the names of several GPU > > configurations, I > > > > > > think > > > > > > > it > > > > > > > > > is > > > > > > > > > > > > better > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > delete the resource field makes it consistent > > with > > > > the > > > > > > > names > > > > > > > > of > > > > > > > > > > other > > > > > > > > > > > > > > > resource-related configurations in > > TaskManagerOption. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > e.g. > > taskmanager.resource.gpu.discovery-script.path > > > > -> > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xingbo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song <tonysong...@gmail.com> > > 于2020年3月4日周三 > > > > > > > 上午10:39写道: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an > > offline > > > > > > > discussion > > > > > > > > > > about > > > > > > > > > > > > > > making > > > > > > > > > > > > > > > > the "GPU Support" as some general "Extended > > > > Resource > > > > > > > > > Support". > > > > > > > > > > We > > > > > > > > > > > > > > believe > > > > > > > > > > > > > > > > supporting extended resources in a general > > > > mechanism > > > > > is > > > > > > > > > > definitely > > > > > > > > > > > > a > > > > > > > > > > > > > > good > > > > > > > > > > > > > > > > and extensible way. The reason we propose > this > > FLIP > > > > > > > > narrowing > > > > > > > > > > its > > > > > > > > > > > > scope > > > > > > > > > > > > > > > > down to GPU alone, is mainly for the concern > on > > > > extra > > > > > > > > efforts > > > > > > > > > > and > > > > > > > > > > > > > > review > > > > > > > > > > > > > > > > capacity needed for a general mechanism. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To come up with a well design on a general > > extended > > > > > > > > resource > > > > > > > > > > > > management > > > > > > > > > > > > > > > > mechanism, we would need to investigate more > > on how > > > > > > > people > > > > > > > > > use > > > > > > > > > > > > > > different > > > > > > > > > > > > > > > > kind of resources in practice. For GPU, we > > learnt > > > > > such > > > > > > > > > > knowledge > > > > > > > > > > > > from > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > experts, Becket and his team members. But for > > FPGA, > > > > > or > > > > > > > > other > > > > > > > > > > > > potential > > > > > > > > > > > > > > > > extended resources, we don't have such > > convenient > > > > > > > > information > > > > > > > > > > > > sources, > > > > > > > > > > > > > > > > making the investigation requires more > efforts, > > > > > which I > > > > > > > > tend > > > > > > > > > to > > > > > > > > > > > > think > > > > > > > > > > > > > > is > > > > > > > > > > > > > > > > not necessary atm. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On the other hand, we also looked into how > > Spark > > > > > > > supports a > > > > > > > > > > general > > > > > > > > > > > > > > "Custom > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we want to > have > > a > > > > > > similar > > > > > > > > > > general > > > > > > > > > > > > > > extended > > > > > > > > > > > > > > > > resource mechanism in the future, we believe > > that > > > > the > > > > > > > > current > > > > > > > > > > GPU > > > > > > > > > > > > > > support > > > > > > > > > > > > > > > > design can be easily extended, in an > > incremental > > > > way > > > > > > > > without > > > > > > > > > > too > > > > > > > > > > > > many > > > > > > > > > > > > > > > > reworks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - The most important part is probably user > > > > > interfaces. > > > > > > > > Spark > > > > > > > > > > > > offers > > > > > > > > > > > > > > > > configuration options to define the amount, > > > > discovery > > > > > > > > script > > > > > > > > > > and > > > > > > > > > > > > > > vendor > > > > > > > > > > > > > > > > (on > > > > > > > > > > > > > > > > k8s) in a per resource type bias [1], which > is > > very > > > > > > > similar > > > > > > > > > to > > > > > > > > > > > > what > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > > proposed in this FLIP. I think it's not > > necessary > > > > to > > > > > > > expose > > > > > > > > > > > > config > > > > > > > > > > > > > > > > options > > > > > > > > > > > > > > > > in the general way atm, since we do not have > > > > supports > > > > > > for > > > > > > > > > other > > > > > > > > > > > > > > resource > > > > > > > > > > > > > > > > types now. If later we decided to have per > > resource > > > > > > type > > > > > > > > > config > > > > > > > > > > > > > > > > options, we > > > > > > > > > > > > > > > > can have backwards compatibility on the > current > > > > > > proposed > > > > > > > > > > options > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > > simple key mapping. > > > > > > > > > > > > > > > > - For the GPU Manager, if later needed we can > > > > change > > > > > it > > > > > > > to > > > > > > > > a > > > > > > > > > > > > > > "Extended > > > > > > > > > > > > > > > > Resource Manager" (or whatever it is called). > > That > > > > > > should > > > > > > > > be > > > > > > > > > a > > > > > > > > > > > > pure > > > > > > > > > > > > > > > > component-internal refactoring. > > > > > > > > > > > > > > > > - For ResourceProfile and ResourceSpec, there > > are > > > > > > already > > > > > > > > > > > > fields for > > > > > > > > > > > > > > > > general extended resource. We can of course > > > > leverage > > > > > > them > > > > > > > > > when > > > > > > > > > > > > > > > > supporting > > > > > > > > > > > > > > > > fine grained GPU scheduling. That is also not > > in > > > > the > > > > > > > scope > > > > > > > > of > > > > > > > > > > > > this > > > > > > > > > > > > > > first > > > > > > > > > > > > > > > > step proposal, and would require FLIP-56 to > be > > > > > finished > > > > > > > > > first. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To summary up, I agree with Becket that have > a > > > > > separate > > > > > > > > FLIP > > > > > > > > > > for > > > > > > > > > > > > the > > > > > > > > > > > > > > > > general extended resource mechanism, and keep > > it in > > > > > > mind > > > > > > > > when > > > > > > > > > > > > > > discussing > > > > > > > > > > > > > > > > and implementing the current one. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin < > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It makes > total > > > > sense > > > > > to > > > > > > > > > > generalize > > > > > > > > > > > > the > > > > > > > > > > > > > > > > > resource management to support custom > > resources. > > > > > > Having > > > > > > > > > that > > > > > > > > > > > > allows > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > > > to add new resources by themselves. The > > general > > > > > > > resource > > > > > > > > > > > > management > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > involve two different aspects: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type definition. It > is > > > > > > supported > > > > > > > > by > > > > > > > > > > the > > > > > > > > > > > > > > extended > > > > > > > > > > > > > > > > > resources in ResourceProfile and > > ResourceSpec. > > > > This > > > > > > > will > > > > > > > > > > likely > > > > > > > > > > > > cover > > > > > > > > > > > > > > > > > majority of the cases. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 2. The custom resource allocation logic, > > i.e. how > > > > > to > > > > > > > > assign > > > > > > > > > > the > > > > > > > > > > > > > > resources > > > > > > > > > > > > > > > > > to different tasks, operators, and so on. > > This > > > > may > > > > > > > > require > > > > > > > > > > two > > > > > > > > > > > > > > levels / > > > > > > > > > > > > > > > > > steps: > > > > > > > > > > > > > > > > > a. Subtask level - make sure the subtasks > > are put > > > > > > into > > > > > > > > > > > > suitable > > > > > > > > > > > > > > > > slots. > > > > > > > > > > > > > > > > > It is done by the global RM and is not > > > > customizable > > > > > > > right > > > > > > > > > > now. > > > > > > > > > > > > > > > > > b. Operator level - map the exact resource > > to the > > > > > > > > operators > > > > > > > > > > > > in > > > > > > > > > > > > > > TM. > > > > > > > > > > > > > > > > e.g. > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. > > This > > > > > step > > > > > > > is > > > > > > > > > > needed > > > > > > > > > > > > > > assuming > > > > > > > > > > > > > > > > > the global RM does not distinguish > individual > > > > > > resources > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > > > same > > > > > > > > > > > > > > type. > > > > > > > > > > > > > > > > > It is true for memory, but not for GPU. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b here. > > So it > > > > > > > should > > > > > > > > > > > > discover the > > > > > > > > > > > > > > > > > physical GPU information and bind/match > them > > to > > > > > each > > > > > > > > > > operators. > > > > > > > > > > > > > > Making > > > > > > > > > > > > > > > > this > > > > > > > > > > > > > > > > > general will fill in the missing piece to > > support > > > > > > > custom > > > > > > > > > > resource > > > > > > > > > > > > > > type > > > > > > > > > > > > > > > > > definition. But I'd avoid calling it a > > "External > > > > > > > Resource > > > > > > > > > > > > Manager" to > > > > > > > > > > > > > > > > avoid > > > > > > > > > > > > > > > > > confusion with RM, maybe something like > > "Operator > > > > > > > > Resource > > > > > > > > > > > > Assigner" > > > > > > > > > > > > > > > > would > > > > > > > > > > > > > > > > > be more accurate. So for each resource type > > users > > > > > can > > > > > > > > have > > > > > > > > > an > > > > > > > > > > > > > > optional > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in the TM. For > > > > memory, > > > > > > > users > > > > > > > > > > don't > > > > > > > > > > > > need > > > > > > > > > > > > > > > > this, > > > > > > > > > > > > > > > > > but for other extended resources, users may > > need > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable "Operator > > Resource > > > > > > > > Assigner" > > > > > > > > > > is > > > > > > > > > > > > > > achievable > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK with having > > that > > > > in > > > > > a > > > > > > > > > separate > > > > > > > > > > > > FLIP > > > > > > > > > > > > > > > > because > > > > > > > > > > > > > > > > > the interface between the "Operator > Resource > > > > > > Assigner" > > > > > > > > and > > > > > > > > > > > > operator > > > > > > > > > > > > > > may > > > > > > > > > > > > > > > > > take a while to settle down if we want to > > make it > > > > > > > > generic. > > > > > > > > > > But I > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > our > > > > > > > > > > > > > > > > > implementation should take this future work > > into > > > > > > > > > > consideration so > > > > > > > > > > > > > > that we > > > > > > > > > > > > > > > > > don't need to break backwards compatibility > > once > > > > we > > > > > > > have > > > > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan > Ewen > > < > > > > > > > > > > se...@apache.org> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you for writing this FLIP. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I cannot really give much input into the > > > > > mechanics > > > > > > of > > > > > > > > > > GPU-aware > > > > > > > > > > > > > > > > > scheduling > > > > > > > > > > > > > > > > > > and GPU allocation, as I have no > experience > > > > with > > > > > > > that. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > One thought I had when reading the > > proposal is > > > > if > > > > > > it > > > > > > > > > makes > > > > > > > > > > > > sense to > > > > > > > > > > > > > > > > look > > > > > > > > > > > > > > > > > at > > > > > > > > > > > > > > > > > > the "GPU Manager" as an "External > Resource > > > > > > Manager", > > > > > > > > and > > > > > > > > > > GPU > > > > > > > > > > > > is one > > > > > > > > > > > > > > > > such > > > > > > > > > > > > > > > > > > resource. > > > > > > > > > > > > > > > > > > The way I understand the ResourceProfile > > and > > > > > > > > > ResourceSpec, > > > > > > > > > > > > that is > > > > > > > > > > > > > > how > > > > > > > > > > > > > > > > it > > > > > > > > > > > > > > > > > > is done there. > > > > > > > > > > > > > > > > > > It has the advantage that it looks more > > > > > extensible. > > > > > > > > Maybe > > > > > > > > > > > > there is > > > > > > > > > > > > > > a > > > > > > > > > > > > > > > > GPU > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA GPU > > Resource, > > > > and > > > > > > FPGA > > > > > > > > > > > > Resource, a > > > > > > > > > > > > > > > > Alibaba > > > > > > > > > > > > > > > > > > TPU Resource, etc. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket > Qin < > > > > > > > > > > > > becket....@gmail.com> > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU > resource > > > > > > management > > > > > > > > > > support > > > > > > > > > > > > is a > > > > > > > > > > > > > > > > > > must-have > > > > > > > > > > > > > > > > > > > for machine learning use cases. > Actually > > it > > > > is > > > > > > one > > > > > > > of > > > > > > > > > the > > > > > > > > > > > > mostly > > > > > > > > > > > > > > > > asked > > > > > > > > > > > > > > > > > > > question from the users who are > > interested in > > > > > > using > > > > > > > > > Flink > > > > > > > > > > > > for ML. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Some quick comments / questions to the > > wiki. > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API should probably > > also > > > > be > > > > > > > > > > mentioned in > > > > > > > > > > > > the > > > > > > > > > > > > > > > > public > > > > > > > > > > > > > > > > > > > interface section. > > > > > > > > > > > > > > > > > > > 2. Is the data structure that holds GPU > > info > > > > > > also a > > > > > > > > > > public > > > > > > > > > > > > API? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong > > Song > > > > < > > > > > > > > > > > > > > tonysong...@gmail.com> > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and > > kicking > > > > off > > > > > > the > > > > > > > > > > > > discussion, > > > > > > > > > > > > > > > > Yangze. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting > > using > > > > of > > > > > > GPU > > > > > > > in > > > > > > > > > > Flink > > > > > > > > > > > > is > > > > > > > > > > > > > > > > > > significant, > > > > > > > > > > > > > > > > > > > > especially for the ML scenarios. > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki doc and > it > > > > looks > > > > > > good > > > > > > > > to > > > > > > > > > > me. I > > > > > > > > > > > > > > think > > > > > > > > > > > > > > > > > it's a > > > > > > > > > > > > > > > > > > > > very good first step for Flink's GPU > > > > > supports. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM > Yangze > > Guo > > > > < > > > > > > > > > > > > karma...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We would like to start a discussion > > > > thread > > > > > on > > > > > > > > > > "FLIP-108: > > > > > > > > > > > > Add > > > > > > > > > > > > > > GPU > > > > > > > > > > > > > > > > > > > > > support in Flink"[1]. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the > > following > > > > > > > issues: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to configure how many > > GPUs > > > > > in a > > > > > > > > task > > > > > > > > > > > > executor > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > > > > > > forward such requirements to the > > external > > > > > > > > resource > > > > > > > > > > > > managers > > > > > > > > > > > > > > (for > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups). > > > > > > > > > > > > > > > > > > > > > - Provide information of available > > GPU > > > > > > > resources > > > > > > > > to > > > > > > > > > > > > > > operators. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP > are > > as > > > > > > > follows: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource requirements > > to > > > > > > > > > > Yarn/Kubernetes. > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as one of > the > > task > > > > > > > manager > > > > > > > > > > > > services to > > > > > > > > > > > > > > > > > > discover > > > > > > > > > > > > > > > > > > > > > and expose GPU resource information > > to > > > > the > > > > > > > > context > > > > > > > > > of > > > > > > > > > > > > > > functions. > > > > > > > > > > > > > > > > > > > > > - Introduce the default script for > > GPU > > > > > > > discovery, > > > > > > > > > in > > > > > > > > > > > > which we > > > > > > > > > > > > > > > > > provide > > > > > > > > > > > > > > > > > > > > > the privilege mode to help user to > > > > achieve > > > > > > > > > > worker-level > > > > > > > > > > > > > > isolation > > > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > > > > > > standalone mode. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Please find more details in the > FLIP > > wiki > > > > > > > > document > > > > > > > > > > [1]. > > > > > > > > > > > > > > Looking > > > > > > > > > > > > > > > > > > forward > > > > > > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > > > > > > your feedbacks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >