Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Xintong Song Sun, 15 Mar 2020 20:22:21 -0700

@Yangze,
I think what Stephan means (@Stephan, please correct me if I'm wrong) is
that, we might not need to hold and maintain the GPUManager as a service in
TaskManagerServices or RuntimeContext. An alternative is to create /
retrieve the GPUManager only in the operators that need it, e.g., with a
static method `GPUManager.get()`.


@Stephan,
I agree with you on excluding GPUManager from TaskManagerServices.

   - For the first step, where we provide unified TM-level GPU information
   to all operators, it should be fine to have operators access /
   lazy-initiate GPUManager by themselves.
   - In future, we might have some more fine-grained GPU management, where
   we need to maintain GPUManager as a service and put GPU info in slot
   profiles. But at least for now it's not necessary to introduce such
   complexity.

However, I have some concerns on excluding GPUManager from RuntimeContext
and let operators access it directly.

   - Configurations needed for creating the GPUManager is not always
   available for operators.
   - If later we want to have fine-grained control over GPU (e.g.,
   operators in each slot can only see GPUs reserved for that slot), the
   approach cannot be easily extended.

I would suggest to wrap the GPUManager behind RuntimeContext and only
expose the GPUInfo to users. For now, we can declare a method
`getGPUInfo()` in RuntimeContext, with a default definition that calls
`GPUManager.get()` to get the lazily-created GPUManager. If later we want
to create / retrieve GPUManager in a different way, we can simply change
how `getGPUInfo` is implemented, without needing to change any public
interfaces.

Thank you~

Xintong Song



On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <[email protected]> wrote:

> @Shephan
> Do you mean Minicluster? Yes, it makes sense to share the GPU Manager
> in such scenario.
> If that's what you worry about, I'm +1 for holding
> GPUManager(ExternalResourceManagers) in TaskExecutor instead of
> TaskManagerServices.
>
> Regarding the RuntimeContext/FunctionContext, it just holds the GPU
> info instead of the GPU Manager. AFAIK, it's the only place we could
> pass GPU info to the RichFunction/UserDefinedFunction.
>
> Best,
> Yangze Guo
>
> On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried <[email protected]>
> wrote:
> >
> >
> >
> >
> >
> > ---- On Fri, 13 Mar 2020 15:58:20 +0000 [email protected] wrote ----
> >
> > > > Can we somehow keep this out of the TaskManager services
> > > I fear that we could not. IMO, the GPUManager(or
> > > ExternalServicesManagers in future) is conceptually one of the task
> > > manager services, just like MemoryManager before 1.10.
> > > - It maintains/holds the GPU resource at TM level and all of the
> > > operators allocate the GPU resources from it. So, it should be
> > > exclusive to a single TaskExecutor.
> > > - We could add a collection called ExternalResourceManagers to hold
> > > all managers of other external resources in the future.
> > >
> >
> > Can you help me understand why this needs the addition in
> TaskMagerServices
> > or in the RuntimeContext?
> > Are you worried about the case when multiple Task Executors run in the
> same
> > JVM? That's not common, but wouldn't it actually be good in that case to
> > share the GPU Manager, given that the GPU is shared?
> >
> > Thanks,
> > Stephan
> >
> > ---------------------------
> >
> >
> > > What parts need information about this?
> > > In this FLIP, operators need the information. Thus, we expose GPU
> > > information to the RuntimeContext/FunctionContext. The slot profile is
> > > not aware of GPU resources as GPU is TM level resource now.
> > >
> > > > Can the GPU Manager be a "self contained" thing that simply takes the
> > > configuration, and then abstracts everything internally?
> > > Yes, we just pass the path/args of the discover script and how many
> > > GPUs per TM to it. It takes the responsibility to get the GPU
> > > information and expose them to the RuntimeContext/FunctionContext of
> > > Operators. Meanwhile, we'd better not allow operators to directly
> > > access GPUManager, it should get what they want from Context. We could
> > > then decouple the interface/implementation of GPUManager and Public
> > > API.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 13, 2020 at 7:26 PM Stephan Ewen <[email protected]> wrote:
> > > >
> > > > It sounds fine to initially start with GPU specific support and think
> > > about
> > > > generalizing this once we better understand the space.
> > > >
> > > > About the implementation suggested in FLIP-108:
> > > > - Can we somehow keep this out of the TaskManager services? Anything
> we
> > > > have to pull through all layers of the TM makes the TM components yet
> > > more
> > > > complex and harder to maintain.
> > > >
> > > > - What parts need information about this?
> > > > -> do the slot profiles need information about the GPU?
> > > > -> Can the GPU Manager be a "self contained" thing that simply takes
> > > > the configuration, and then abstracts everything internally?
> Operators
> > > can
> > > > access it via "GPUManager.get()" or so?
> > > >
> > > >
> > > >
> > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze Guo <[email protected]>
> wrote:
> > > >
> > > > > Thanks for all the feedbacks.
> > > > >
> > > > > @Becket
> > > > > Regarding the WebUI and GPUInfo, you're right, I'll add them to the
> > > > > Public API section.
> > > > >
> > > > >
> > > > > @Stephan @Becket
> > > > > Regarding the general extended resource mechanism, I second
> Xintong's
> > > > > suggestion.
> > > > > - It's better to leverage ResourceProfile and ResourceSpec after we
> > > > > supporting fine-grained GPU scheduling. As a first step proposal, I
> > > > > prefer to not include it in the scope of this FLIP.
> > > > > - Regarding the "Extended Resource Manager", if I understand
> > > > > correctly, it just a code refactoring atm, we could extract the
> > > > > open/close/allocateExtendResources of GPUManager to that
> interface. If
> > > > > that is the case, +1 to do it during implementation.
> > > > >
> > > > > @Xingbo
> > > > > As Xintong said, we looked into how Spark supports a general
> "Custom
> > > > > Resource Scheduling" before and decided to introduce a common
> resource
> > > > > configuration
> > > > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > to make it more extensible. I think the "resource" is a proper
> level
> > > > > to contain all the configs of extended resources.
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo Huang <[email protected]>
> > > wrote:
> > > > > >
> > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > >
> > > > > > There is no doubt that GPU resource management support will
> greatly
> > > > > > facilitate the development of AI-related applications by PyFlink
> > > users.
> > > > > >
> > > > > > I have only one comment about this wiki:
> > > > > >
> > > > > > Regarding the names of several GPU configurations, I think it is
> > > better
> > > > > to
> > > > > > delete the resource field makes it consistent with the names of
> other
> > > > > > resource-related configurations in TaskManagerOption.
> > > > > >
> > > > > > e.g. taskmanager.resource.gpu.discovery-script.path ->
> > > > > > taskmanager.gpu.discovery-script.path
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xingbo
> > > > > >
> > > > > >
> > > > > > Xintong Song <[email protected]> 于2020年3月4日周三 上午10:39写道：
> > > > > >
> > > > > > > @Stephan, @Becket,
> > > > > > >
> > > > > > > Actually, Yangze, Yang and I also had an offline discussion
> about
> > > > > making
> > > > > > > the "GPU Support" as some general "Extended Resource Support".
> We
> > > > > believe
> > > > > > > supporting extended resources in a general mechanism is
> definitely
> > > a
> > > > > good
> > > > > > > and extensible way. The reason we propose this FLIP narrowing
> its
> > > scope
> > > > > > > down to GPU alone, is mainly for the concern on extra efforts
> and
> > > > > review
> > > > > > > capacity needed for a general mechanism.
> > > > > > >
> > > > > > > To come up with a well design on a general extended resource
> > > management
> > > > > > > mechanism, we would need to investigate more on how people use
> > > > > different
> > > > > > > kind of resources in practice. For GPU, we learnt such
> knowledge
> > > from
> > > > > the
> > > > > > > experts, Becket and his team members. But for FPGA, or other
> > > potential
> > > > > > > extended resources, we don't have such convenient information
> > > sources,
> > > > > > > making the investigation requires more efforts, which I tend to
> > > think
> > > > > is
> > > > > > > not necessary atm.
> > > > > > >
> > > > > > > On the other hand, we also looked into how Spark supports a
> general
> > > > > "Custom
> > > > > > > Resource Scheduling". Assuming we want to have a similar
> general
> > > > > extended
> > > > > > > resource mechanism in the future, we believe that the current
> GPU
> > > > > support
> > > > > > > design can be easily extended, in an incremental way without
> too
> > > many
> > > > > > > reworks.
> > > > > > >
> > > > > > > - The most important part is probably user interfaces. Spark
> > > offers
> > > > > > > configuration options to define the amount, discovery script
> and
> > > > > vendor
> > > > > > > (on
> > > > > > > k8s) in a per resource type bias [1], which is very similar to
> > > what
> > > > > we
> > > > > > > proposed in this FLIP. I think it's not necessary to expose
> > > config
> > > > > > > options
> > > > > > > in the general way atm, since we do not have supports for other
> > > > > resource
> > > > > > > types now. If later we decided to have per resource type config
> > > > > > > options, we
> > > > > > > can have backwards compatibility on the current proposed
> options
> > > > > with
> > > > > > > simple key mapping.
> > > > > > > - For the GPU Manager, if later needed we can change it to a
> > > > > "Extended
> > > > > > > Resource Manager" (or whatever it is called). That should be a
> > > pure
> > > > > > > component-internal refactoring.
> > > > > > > - For ResourceProfile and ResourceSpec, there are already
> > > fields for
> > > > > > > general extended resource. We can of course leverage them when
> > > > > > > supporting
> > > > > > > fine grained GPU scheduling. That is also not in the scope of
> > > this
> > > > > first
> > > > > > > step proposal, and would require FLIP-56 to be finished first.
> > > > > > >
> > > > > > > To summary up, I agree with Becket that have a separate FLIP
> for
> > > the
> > > > > > > general extended resource mechanism, and keep it in mind when
> > > > > discussing
> > > > > > > and implementing the current one.
> > > > > > >
> > > > > > > Thank you~
> > > > > > >
> > > > > > > Xintong Song
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > >
> > > > > > > On Wed, Mar 4, 2020 at 9:18 AM Becket Qin <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > That's a good point, Stephan. It makes total sense to
> generalize
> > > the
> > > > > > > > resource management to support custom resources. Having that
> > > allows
> > > > > users
> > > > > > > > to add new resources by themselves. The general resource
> > > management
> > > > > may
> > > > > > > > involve two different aspects:
> > > > > > > >
> > > > > > > > 1. The custom resource type definition. It is supported by
> the
> > > > > extended
> > > > > > > > resources in ResourceProfile and ResourceSpec. This will
> likely
> > > cover
> > > > > > > > majority of the cases.
> > > > > > > >
> > > > > > > > 2. The custom resource allocation logic, i.e. how to assign
> the
> > > > > resources
> > > > > > > > to different tasks, operators, and so on. This may require
> two
> > > > > levels /
> > > > > > > > steps:
> > > > > > > > a. Subtask level - make sure the subtasks are put into
> > > suitable
> > > > > > > slots.
> > > > > > > > It is done by the global RM and is not customizable right
> now.
> > > > > > > > b. Operator level - map the exact resource to the operators
> > > in
> > > > > TM.
> > > > > > > e.g.
> > > > > > > > GPU 1 for operator A, GPU 2 for operator B. This step is
> needed
> > > > > assuming
> > > > > > > > the global RM does not distinguish individual resources of
> the
> > > same
> > > > > type.
> > > > > > > > It is true for memory, but not for GPU.
> > > > > > > >
> > > > > > > > The GPU manager is designed to do 2.b here. So it should
> > > discover the
> > > > > > > > physical GPU information and bind/match them to each
> operators.
> > > > > Making
> > > > > > > this
> > > > > > > > general will fill in the missing piece to support custom
> resource
> > > > > type
> > > > > > > > definition. But I'd avoid calling it a "External Resource
> > > Manager" to
> > > > > > > avoid
> > > > > > > > confusion with RM, maybe something like "Operator Resource
> > > Assigner"
> > > > > > > would
> > > > > > > > be more accurate. So for each resource type users can have an
> > > > > optional
> > > > > > > > "Operator Resource Assigner" in the TM. For memory, users
> don't
> > > need
> > > > > > > this,
> > > > > > > > but for other extended resources, users may need that.
> > > > > > > >
> > > > > > > > Personally I think a pluggable "Operator Resource Assigner"
> is
> > > > > achievable
> > > > > > > > in this FLIP. But I am also OK with having that in a separate
> > > FLIP
> > > > > > > because
> > > > > > > > the interface between the "Operator Resource Assigner" and
> > > operator
> > > > > may
> > > > > > > > take a while to settle down if we want to make it generic.
> But I
> > > > > think
> > > > > > > our
> > > > > > > > implementation should take this future work into
> consideration so
> > > > > that we
> > > > > > > > don't need to break backwards compatibility once we have
> that.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jiangjie (Becket) Qin
> > > > > > > >
> > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM Stephan Ewen <
> [email protected]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Thank you for writing this FLIP.
> > > > > > > > >
> > > > > > > > > I cannot really give much input into the mechanics of
> GPU-aware
> > > > > > > > scheduling
> > > > > > > > > and GPU allocation, as I have no experience with that.
> > > > > > > > >
> > > > > > > > > One thought I had when reading the proposal is if it makes
> > > sense to
> > > > > > > look
> > > > > > > > at
> > > > > > > > > the "GPU Manager" as an "External Resource Manager", and
> GPU
> > > is one
> > > > > > > such
> > > > > > > > > resource.
> > > > > > > > > The way I understand the ResourceProfile and ResourceSpec,
> > > that is
> > > > > how
> > > > > > > it
> > > > > > > > > is done there.
> > > > > > > > > It has the advantage that it looks more extensible. Maybe
> > > there is
> > > > > a
> > > > > > > GPU
> > > > > > > > > Resource, a specialized NVIDIA GPU Resource, and FPGA
> > > Resource, a
> > > > > > > Alibaba
> > > > > > > > > TPU Resource, etc.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stephan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM Becket Qin <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the FLIP Yangze. GPU resource management
> support
> > > is a
> > > > > > > > > must-have
> > > > > > > > > > for machine learning use cases. Actually it is one of the
> > > mostly
> > > > > > > asked
> > > > > > > > > > question from the users who are interested in using Flink
> > > for ML.
> > > > > > > > > >
> > > > > > > > > > Some quick comments / questions to the wiki.
> > > > > > > > > > 1. The WebUI / REST API should probably also be
> mentioned in
> > > the
> > > > > > > public
> > > > > > > > > > interface section.
> > > > > > > > > > 2. Is the data structure that holds GPU info also a
> public
> > > API?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 3, 2020 at 10:15 AM Xintong Song <
> > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks for drafting the FLIP and kicking off the
> > > discussion,
> > > > > > > Yangze.
> > > > > > > > > > >
> > > > > > > > > > > Big +1 for this feature. Supporting using of GPU in
> Flink
> > > is
> > > > > > > > > significant,
> > > > > > > > > > > especially for the ML scenarios.
> > > > > > > > > > > I've reviewed the FLIP wiki doc and it looks good to
> me. I
> > > > > think
> > > > > > > > it's a
> > > > > > > > > > > very good first step for Flink's GPU supports.
> > > > > > > > > > >
> > > > > > > > > > > Thank you~
> > > > > > > > > > >
> > > > > > > > > > > Xintong Song
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 2, 2020 at 12:06 PM Yangze Guo <
> > > [email protected]
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > >
> > > > > > > > > > > > We would like to start a discussion thread on
> "FLIP-108:
> > > Add
> > > > > GPU
> > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > >
> > > > > > > > > > > > This FLIP mainly discusses the following issues:
> > > > > > > > > > > >
> > > > > > > > > > > > - Enable user to configure how many GPUs in a task
> > > executor
> > > > > and
> > > > > > > > > > > > forward such requirements to the external resource
> > > managers
> > > > > (for
> > > > > > > > > > > > Kubernetes/Yarn/Mesos setups).
> > > > > > > > > > > > - Provide information of available GPU resources to
> > > > > operators.
> > > > > > > > > > > >
> > > > > > > > > > > > Key changes proposed in the FLIP are as follows:
> > > > > > > > > > > >
> > > > > > > > > > > > - Forward GPU resource requirements to
> Yarn/Kubernetes.
> > > > > > > > > > > > - Introduce GPUManager as one of the task manager
> > > services to
> > > > > > > > > discover
> > > > > > > > > > > > and expose GPU resource information to the context of
> > > > > functions.
> > > > > > > > > > > > - Introduce the default script for GPU discovery, in
> > > which we
> > > > > > > > provide
> > > > > > > > > > > > the privilege mode to help user to achieve
> worker-level
> > > > > isolation
> > > > > > > > in
> > > > > > > > > > > > standalone mode.
> > > > > > > > > > > >
> > > > > > > > > > > > Please find more details in the FLIP wiki document
> [1].
> > > > > Looking
> > > > > > > > > forward
> > > > > > > > > > > to
> > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > >
> > > > > > > > > > > > [1]
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
> >
> >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Reply via email to