@Yangze, I think what Stephan means (@Stephan, please correct me if I'm wrong) is that, we might not need to hold and maintain the GPUManager as a service in TaskManagerServices or RuntimeContext. An alternative is to create / retrieve the GPUManager only in the operators that need it, e.g., with a static method `GPUManager.get()`.
@Stephan, I agree with you on excluding GPUManager from TaskManagerServices. - For the first step, where we provide unified TM-level GPU information to all operators, it should be fine to have operators access / lazy-initiate GPUManager by themselves. - In future, we might have some more fine-grained GPU management, where we need to maintain GPUManager as a service and put GPU info in slot profiles. But at least for now it's not necessary to introduce such complexity. However, I have some concerns on excluding GPUManager from RuntimeContext and let operators access it directly. - Configurations needed for creating the GPUManager is not always available for operators. - If later we want to have fine-grained control over GPU (e.g., operators in each slot can only see GPUs reserved for that slot), the approach cannot be easily extended. I would suggest to wrap the GPUManager behind RuntimeContext and only expose the GPUInfo to users. For now, we can declare a method `getGPUInfo()` in RuntimeContext, with a default definition that calls `GPUManager.get()` to get the lazily-created GPUManager. If later we want to create / retrieve GPUManager in a different way, we can simply change how `getGPUInfo` is implemented, without needing to change any public interfaces. Thank you~ Xintong Song On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <karma...@gmail.com> wrote: > @Shephan > Do you mean Minicluster? Yes, it makes sense to share the GPU Manager > in such scenario. > If that's what you worry about, I'm +1 for holding > GPUManager(ExternalResourceManagers) in TaskExecutor instead of > TaskManagerServices. > > Regarding the RuntimeContext/FunctionContext, it just holds the GPU > info instead of the GPU Manager. AFAIK, it's the only place we could > pass GPU info to the RichFunction/UserDefinedFunction. > > Best, > Yangze Guo > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <is...@paddlesoft.net> > wrote: > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000 se...@apache.org wrote ---- > > > > > > Can we somehow keep this out of the TaskManager services > > > I fear that we could not. IMO, the GPUManager(or > > > ExternalServicesManagers in future) is conceptually one of the task > > > manager services, just like MemoryManager before 1.10. > > > - It maintains/holds the GPU resource at TM level and all of the > > > operators allocate the GPU resources from it. So, it should be > > > exclusive to a single TaskExecutor. > > > - We could add a collection called ExternalResourceManagers to hold > > > all managers of other external resources in the future. > > > > > > > Can you help me understand why this needs the addition in > TaskMagerServices > > or in the RuntimeContext? > > Are you worried about the case when multiple Task Executors run in the > same > > JVM? That's not common, but wouldn't it actually be good in that case to > > share the GPU Manager, given that the GPU is shared? > > > > Thanks, > > Stephan > > > > --------------------------- > > > > > > > What parts need information about this? > > > In this FLIP, operators need the information. Thus, we expose GPU > > > information to the RuntimeContext/FunctionContext. The slot profile is > > > not aware of GPU resources as GPU is TM level resource now. > > > > > > > Can the GPU Manager be a "self contained" thing that simply takes the > > > configuration, and then abstracts everything internally? > > > Yes, we just pass the path/args of the discover script and how many > > > GPUs per TM to it. It takes the responsibility to get the GPU > > > information and expose them to the RuntimeContext/FunctionContext of > > > Operators. Meanwhile, we'd better not allow operators to directly > > > access GPUManager, it should get what they want from Context. We could > > > then decouple the interface/implementation of GPUManager and Public > > > API. > > > > > > Best, > > > Yangze Guo > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <se...@apache.org> wrote: > > > > > > > > It sounds fine to initially start with GPU specific support and think > > > about > > > > generalizing this once we better understand the space. > > > > > > > > About the implementation suggested in FLIP-108: > > > > - Can we somehow keep this out of the TaskManager services? Anything > we > > > > have to pull through all layers of the TM makes the TM components yet > > > more > > > > complex and harder to maintain. > > > > > > > > - What parts need information about this? > > > > -> do the slot profiles need information about the GPU? > > > > -> Can the GPU Manager be a "self contained" thing that simply takes > > > > the configuration, and then abstracts everything internally? > Operators > > > can > > > > access it via "GPUManager.get()" or so? > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <karma...@gmail.com> > wrote: > > > > > > > > > Thanks for all the feedbacks. > > > > > > > > > > @Becket > > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the > > > > > Public API section. > > > > > > > > > > > > > > > @Stephan @Becket > > > > > Regarding the general extended resource mechanism, I second > Xintong's > > > > > suggestion. > > > > > - It's better to leverage ResourceProfile and ResourceSpec after we > > > > > supporting fine-grained GPU scheduling. As a first step proposal, I > > > > > prefer to not include it in the scope of this FLIP. > > > > > - Regarding the "Extended Resource Manager", if I understand > > > > > correctly, it just a code refactoring atm, we could extract the > > > > > open/close/allocateExtendResources of GPUManager to that > interface. If > > > > > that is the case, +1 to do it during implementation. > > > > > > > > > > @Xingbo > > > > > As Xintong said, we looked into how Spark supports a general > "Custom > > > > > Resource Scheduling" before and decided to introduce a common > resource > > > > > configuration > > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script) > > > > > to make it more extensible. I think the "resource" is a proper > level > > > > > to contain all the configs of extended resources. > > > > > > > > > > Best, > > > > > Yangze Guo > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <hxbks...@gmail.com> > > > wrote: > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze. > > > > > > > > > > > > There is no doubt that GPU resource management support will > greatly > > > > > > facilitate the development of AI-related applications by PyFlink > > > users. > > > > > > > > > > > > I have only one comment about this wiki: > > > > > > > > > > > > Regarding the names of several GPU configurations, I think it is > > > better > > > > > to > > > > > > delete the resource field makes it consistent with the names of > other > > > > > > resource-related configurations in TaskManagerOption. > > > > > > > > > > > > e.g. taskmanager.resource.gpu.discovery-script.path -> > > > > > > taskmanager.gpu.discovery-script.path > > > > > > > > > > > > Best, > > > > > > > > > > > > Xingbo > > > > > > > > > > > > > > > > > > Xintong Song <tonysong...@gmail.com> 于2020年3月4日周三 上午10:39写道: > > > > > > > > > > > > > @Stephan, @Becket, > > > > > > > > > > > > > > Actually, Yangze, Yang and I also had an offline discussion > about > > > > > making > > > > > > > the "GPU Support" as some general "Extended Resource Support". > We > > > > > believe > > > > > > > supporting extended resources in a general mechanism is > definitely > > > a > > > > > good > > > > > > > and extensible way. The reason we propose this FLIP narrowing > its > > > scope > > > > > > > down to GPU alone, is mainly for the concern on extra efforts > and > > > > > review > > > > > > > capacity needed for a general mechanism. > > > > > > > > > > > > > > To come up with a well design on a general extended resource > > > management > > > > > > > mechanism, we would need to investigate more on how people use > > > > > different > > > > > > > kind of resources in practice. For GPU, we learnt such > knowledge > > > from > > > > > the > > > > > > > experts, Becket and his team members. But for FPGA, or other > > > potential > > > > > > > extended resources, we don't have such convenient information > > > sources, > > > > > > > making the investigation requires more efforts, which I tend to > > > think > > > > > is > > > > > > > not necessary atm. > > > > > > > > > > > > > > On the other hand, we also looked into how Spark supports a > general > > > > > "Custom > > > > > > > Resource Scheduling". Assuming we want to have a similar > general > > > > > extended > > > > > > > resource mechanism in the future, we believe that the current > GPU > > > > > support > > > > > > > design can be easily extended, in an incremental way without > too > > > many > > > > > > > reworks. > > > > > > > > > > > > > > - The most important part is probably user interfaces. Spark > > > offers > > > > > > > configuration options to define the amount, discovery script > and > > > > > vendor > > > > > > > (on > > > > > > > k8s) in a per resource type bias [1], which is very similar to > > > what > > > > > we > > > > > > > proposed in this FLIP. I think it's not necessary to expose > > > config > > > > > > > options > > > > > > > in the general way atm, since we do not have supports for other > > > > > resource > > > > > > > types now. If later we decided to have per resource type config > > > > > > > options, we > > > > > > > can have backwards compatibility on the current proposed > options > > > > > with > > > > > > > simple key mapping. > > > > > > > - For the GPU Manager, if later needed we can change it to a > > > > > "Extended > > > > > > > Resource Manager" (or whatever it is called). That should be a > > > pure > > > > > > > component-internal refactoring. > > > > > > > - For ResourceProfile and ResourceSpec, there are already > > > fields for > > > > > > > general extended resource. We can of course leverage them when > > > > > > > supporting > > > > > > > fine grained GPU scheduling. That is also not in the scope of > > > this > > > > > first > > > > > > > step proposal, and would require FLIP-56 to be finished first. > > > > > > > > > > > > > > To summary up, I agree with Becket that have a separate FLIP > for > > > the > > > > > > > general extended resource mechanism, and keep it in mind when > > > > > discussing > > > > > > > and implementing the current one. > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin < > becket....@gmail.com> > > > > > wrote: > > > > > > > > > > > > > > > That's a good point, Stephan. It makes total sense to > generalize > > > the > > > > > > > > resource management to support custom resources. Having that > > > allows > > > > > users > > > > > > > > to add new resources by themselves. The general resource > > > management > > > > > may > > > > > > > > involve two different aspects: > > > > > > > > > > > > > > > > 1. The custom resource type definition. It is supported by > the > > > > > extended > > > > > > > > resources in ResourceProfile and ResourceSpec. This will > likely > > > cover > > > > > > > > majority of the cases. > > > > > > > > > > > > > > > > 2. The custom resource allocation logic, i.e. how to assign > the > > > > > resources > > > > > > > > to different tasks, operators, and so on. This may require > two > > > > > levels / > > > > > > > > steps: > > > > > > > > a. Subtask level - make sure the subtasks are put into > > > suitable > > > > > > > slots. > > > > > > > > It is done by the global RM and is not customizable right > now. > > > > > > > > b. Operator level - map the exact resource to the operators > > > in > > > > > TM. > > > > > > > e.g. > > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is > needed > > > > > assuming > > > > > > > > the global RM does not distinguish individual resources of > the > > > same > > > > > type. > > > > > > > > It is true for memory, but not for GPU. > > > > > > > > > > > > > > > > The GPU manager is designed to do 2.b here. So it should > > > discover the > > > > > > > > physical GPU information and bind/match them to each > operators. > > > > > Making > > > > > > > this > > > > > > > > general will fill in the missing piece to support custom > resource > > > > > type > > > > > > > > definition. But I'd avoid calling it a "External Resource > > > Manager" to > > > > > > > avoid > > > > > > > > confusion with RM, maybe something like "Operator Resource > > > Assigner" > > > > > > > would > > > > > > > > be more accurate. So for each resource type users can have an > > > > > optional > > > > > > > > "Operator Resource Assigner" in the TM. For memory, users > don't > > > need > > > > > > > this, > > > > > > > > but for other extended resources, users may need that. > > > > > > > > > > > > > > > > Personally I think a pluggable "Operator Resource Assigner" > is > > > > > achievable > > > > > > > > in this FLIP. But I am also OK with having that in a separate > > > FLIP > > > > > > > because > > > > > > > > the interface between the "Operator Resource Assigner" and > > > operator > > > > > may > > > > > > > > take a while to settle down if we want to make it generic. > But I > > > > > think > > > > > > > our > > > > > > > > implementation should take this future work into > consideration so > > > > > that we > > > > > > > > don't need to break backwards compatibility once we have > that. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen < > se...@apache.org> > > > > > wrote: > > > > > > > > > > > > > > > > > Thank you for writing this FLIP. > > > > > > > > > > > > > > > > > > I cannot really give much input into the mechanics of > GPU-aware > > > > > > > > scheduling > > > > > > > > > and GPU allocation, as I have no experience with that. > > > > > > > > > > > > > > > > > > One thought I had when reading the proposal is if it makes > > > sense to > > > > > > > look > > > > > > > > at > > > > > > > > > the "GPU Manager" as an "External Resource Manager", and > GPU > > > is one > > > > > > > such > > > > > > > > > resource. > > > > > > > > > The way I understand the ResourceProfile and ResourceSpec, > > > that is > > > > > how > > > > > > > it > > > > > > > > > is done there. > > > > > > > > > It has the advantage that it looks more extensible. Maybe > > > there is > > > > > a > > > > > > > GPU > > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA > > > Resource, a > > > > > > > Alibaba > > > > > > > > > TPU Resource, etc. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Stephan > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin < > > > becket....@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management > support > > > is a > > > > > > > > > must-have > > > > > > > > > > for machine learning use cases. Actually it is one of the > > > mostly > > > > > > > asked > > > > > > > > > > question from the users who are interested in using Flink > > > for ML. > > > > > > > > > > > > > > > > > > > > Some quick comments / questions to the wiki. > > > > > > > > > > 1. The WebUI / REST API should probably also be > mentioned in > > > the > > > > > > > public > > > > > > > > > > interface section. > > > > > > > > > > 2. Is the data structure that holds GPU info also a > public > > > API? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song < > > > > > tonysong...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the FLIP and kicking off the > > > discussion, > > > > > > > Yangze. > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in > Flink > > > is > > > > > > > > > significant, > > > > > > > > > > > especially for the ML scenarios. > > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to > me. I > > > > > think > > > > > > > > it's a > > > > > > > > > > > very good first step for Flink's GPU supports. > > > > > > > > > > > > > > > > > > > > > > Thank you~ > > > > > > > > > > > > > > > > > > > > > > Xintong Song > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo < > > > karma...@gmail.com > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > We would like to start a discussion thread on > "FLIP-108: > > > Add > > > > > GPU > > > > > > > > > > > > support in Flink"[1]. > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly discusses the following issues: > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to configure how many GPUs in a task > > > executor > > > > > and > > > > > > > > > > > > forward such requirements to the external resource > > > managers > > > > > (for > > > > > > > > > > > > Kubernetes/Yarn/Mesos setups). > > > > > > > > > > > > - Provide information of available GPU resources to > > > > > operators. > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in the FLIP are as follows: > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource requirements to > Yarn/Kubernetes. > > > > > > > > > > > > - Introduce GPUManager as one of the task manager > > > services to > > > > > > > > > discover > > > > > > > > > > > > and expose GPU resource information to the context of > > > > > functions. > > > > > > > > > > > > - Introduce the default script for GPU discovery, in > > > which we > > > > > > > > provide > > > > > > > > > > > > the privilege mode to help user to achieve > worker-level > > > > > isolation > > > > > > > > in > > > > > > > > > > > > standalone mode. > > > > > > > > > > > > > > > > > > > > > > > > Please find more details in the FLIP wiki document > [1]. > > > > > Looking > > > > > > > > > forward > > > > > > > > > > > to > > > > > > > > > > > > your feedbacks. > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >