Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Till Rohrmann Mon, 30 Mar 2020 02:18:01 -0700

At the moment the RM does not have a user code class loader and I agree
with Stephan that it should stay like this. This, however, does not mean
that we cannot support pluggable components in the RM. As long as the
plugins are on the system's class path, it should be fine for the RM to
load them. For example, we could add external resources via Flink's plugin
mechanism or something similar.


A very simple implementation of such an ExternalResourceDriver could be a
class which simply returns what is written in the flink-conf.yaml under a
given key.

Cheers,
Till

On Mon, Mar 30, 2020 at 5:39 AM Yangze Guo <[email protected]> wrote:

> Hi, Stephan,
>
> I see your concern and I totally agree with you.
>
> The interface on RM side is now `Map<String key, String/Long value>
> getYarn/KubernetesExternalResource()`. The only valid information RM
> get from it is the configuration key of that external resource in
> Yarn/K8s. The "String/Long value" would be the same as the
> external-resource.{resourceName}.amount.
> So, I think it makes sense to replace these two interfaces with two
> configs, i.e. external-resource.{resourceName}.yarn/kubernetes.key. We
> may lose some extensibility, but AFAIK it could work with common
> external resources like GPU, FPGA. WDYT?
>
> Best,
> Yangze Guo
>
> On Fri, Mar 27, 2020 at 7:59 PM Stephan Ewen <[email protected]> wrote:
> >
> > Maybe one final comment: It is probably not an issue, but let's try and
> > keep user code (via user code classloader) out of the ResourceManager, if
> > possible.
> >
> > As background:
> >
> > There were thoughts in the past to support setups where the RM must run
> > with "superuser" credentials, but we cannot run JM/TM with these
> > credentials, as the user code might access them otherwise.
> > This is actually possible today, you can run the RM in a different JVM or
> > in a different container, and give it more credentials than JMs / TMs.
> But
> > for this to be feasible, we cannot allow any user-defined code to be in
> the
> > JVM, because that instantaneously breaks the isolation of credentials.
> >
> >
> >
> > On Fri, Mar 27, 2020 at 4:01 AM Yangze Guo <[email protected]> wrote:
> >
> > > Thanks for the feedback, @Till and @Xintong.
> > >
> > > Regarding separating the interface, I'm also +1 with it.
> > >
> > > Regarding the resource allocation interface, true, it's dangerous to
> > > give much access to user codes. Changing the return type to Map<String
> > > key, String/Long value> makes sense to me. AFAIK, it is compatible
> > > with all the first-party supported resources for Yarn/Kubernetes. It
> > > could also free us from the potential dependency issue as well.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Mar 27, 2020 at 10:42 AM Xintong Song <[email protected]>
> > > wrote:
> > > >
> > > > Thanks for updating the FLIP, Yangze.
> > > >
> > > > I agree with Till that we probably want to separate the K8s/Yarn
> > > decorator
> > > > calls. Users can still configure one driver class, and we can use
> > > > `instanceof` to check whether the driver implemented K8s/Yarn
> specific
> > > > interfaces.
> > > >
> > > > Moreover, I'm not sure about exposing entire `ContainerRequest` /
> `Pod`
> > > > (`AbstractKubernetesStepDecorator` directly manipulates on `Pod`) to
> user
> > > > codes. It gives more access to user codes than needed for defining
> > > external
> > > > resource, which might cause problems. Instead, I would suggest to
> have
> > > > interface like `Map<String key, String value>
> > > > getYarn/KubernetesExternalResource()` and assemble them into
> > > > `ContainerRequest` / `Pod` in Yarn/KubernetesResourceManager.
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Mar 27, 2020 at 1:10 AM Till Rohrmann <[email protected]>
> > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I'm a bit late to the party. I think the current proposal looks
> good.
> > > > >
> > > > > Concerning the ExternalResourceDriver interface defined in the FLIP
> > > [1], I
> > > > > would suggest to not include the decorator calls for Kubernetes and
> > > Yarn in
> > > > > the base interface. Instead I would suggest to segregate the
> deployment
> > > > > specific decorator calls into separate interfaces. That way an
> > > > > ExternalResourceDriver does not have to support all deployments
> from
> > > the
> > > > > very beginning. Moreover, some resources might not be supported by
> a
> > > > > specific deployment target and the natural way to express this
> would
> > > be to
> > > > > not implement the respective deployment specific interface.
> > > > >
> > > > > Moreover, having void
> > > > > addExternalResourceToRequest(AMRMClient.ContainerRequest
> > > containerRequest)
> > > > > in the ExternalResourceDriver interface would require Hadoop on
> Flink's
> > > > > classpath whenever the external resource driver is being used.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Thu, Mar 26, 2020 at 12:45 PM Stephan Ewen <[email protected]>
> > > wrote:
> > > > >
> > > > > > Nice, thanks a lot!
> > > > > >
> > > > > > On Thu, Mar 26, 2020 at 10:21 AM Yangze Guo <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Thanks for the suggestion, @Stephan, @Becket and @Xintong.
> > > > > > >
> > > > > > > I've updated the FLIP accordingly. I do not add a
> > > > > > > ResourceInfoProvider. Instead, I introduce the
> > > ExternalResourceDriver,
> > > > > > > which takes the responsibility of all relevant operations on
> both
> > > RM
> > > > > > > and TM sides.
> > > > > > > After a rethink about decoupling the management of external
> > > resources
> > > > > > > from TaskExecutor, I think we could do the same thing on the
> > > > > > > ResourceManager side. We do not need to add a specific
> allocation
> > > > > > > logic to the ResourceManager each time we add a specific
> external
> > > > > > > resource.
> > > > > > > - For Yarn, we need the ExternalResourceDriver to edit the
> > > > > > > containerRequest.
> > > > > > > - For Kubenetes, ExternalResourceDriver could provide a
> decorator
> > > for
> > > > > > > the TM pod.
> > > > > > >
> > > > > > > In this way, just like MetricReporter, we allow users to define
> > > their
> > > > > > > custom ExternalResourceDriver. It is more extensible and fits
> the
> > > > > > > separation of concerns. For more details, please take a look at
> > > [1].
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > >
> > > > > > > Best,
> > > > > > > Yangze Guo
> > > > > > >
> > > > > > > On Wed, Mar 25, 2020 at 7:32 PM Stephan Ewen <[email protected]
> >
> > > wrote:
> > > > > > > >
> > > > > > > > This sounds good to go ahead from my side.
> > > > > > > >
> > > > > > > > I like the approach that Becket suggested - in that case the
> core
> > > > > > > > abstraction that everyone would need to understand would be
> > > "external
> > > > > > > > resource allocation" and the "ResourceInfoProvider", and the
> GPU
> > > > > > specific
> > > > > > > > code would be a specific implementation only known to that
> > > component
> > > > > > that
> > > > > > > > allocates the external resource. That fits the separation of
> > > concerns
> > > > > > > well.
> > > > > > > >
> > > > > > > > I also understand that it should not be over-engineered in
> the
> > > first
> > > > > > > > version, so some simplification makes sense, and then
> gradually
> > > > > expand
> > > > > > > from
> > > > > > > > there.
> > > > > > > >
> > > > > > > > So +1 to go ahead with what was suggested above (Xintong /
> > > Becket)
> > > > > from
> > > > > > > my
> > > > > > > > side.
> > > > > > > >
> > > > > > > > On Mon, Mar 23, 2020 at 6:55 AM Xintong Song <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the comments, Stephan & Becket.
> > > > > > > > >
> > > > > > > > > @Stephan
> > > > > > > > >
> > > > > > > > > I see your concern, and I completely agree with you that we
> > > should
> > > > > > > first
> > > > > > > > > think about the "library" / "plugin" / "extension" style if
> > > > > possible.
> > > > > > > > >
> > > > > > > > > If GPUs are sliced and assigned during scheduling, there
> may be
> > > > > > reason,
> > > > > > > > > > although it looks that it would belong to the slot then.
> Is
> > > that
> > > > > > > what we
> > > > > > > > > > are doing here?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In the current proposal, we do not have the GPUs sliced and
> > > > > assigned
> > > > > > to
> > > > > > > > > slots, because it could be problematic without dynamic slot
> > > > > > allocation.
> > > > > > > > > E.g., the number of GPUs might not be evenly divisible by
> the
> > > > > number
> > > > > > of
> > > > > > > > > slots.
> > > > > > > > >
> > > > > > > > > I think it makes sense to eventually have the GPUs
> assigned to
> > > > > slots.
> > > > > > > Even
> > > > > > > > > then, we might still need a TM level GPUManager (or
> > > > > ResourceProvider
> > > > > > > like
> > > > > > > > > Becket suggested). For memory, in each slot we can simply
> > > request
> > > > > the
> > > > > > > > > amount of memory, leaving it to JVM / OS to decide which
> memory
> > > > > > > (address)
> > > > > > > > > should be assigned. For GPU, and potentially other
> resources
> > > like
> > > > > > > FPGA, we
> > > > > > > > > need to explicitly specify which GPU (index) should be
> used.
> > > > > > > Therefore, we
> > > > > > > > > need some component at the TM level to coordinate which
> slot
> > > uses
> > > > > > which
> > > > > > > > > GPU.
> > > > > > > > >
> > > > > > > > > IMO, unless we say Flink will not support slot-level GPU
> > > slicing at
> > > > > > > least
> > > > > > > > > in the foreseeable future, I don't see a good way to avoid
> > > touching
> > > > > > > the TM
> > > > > > > > > core. To that end, I think Becket's suggestion points to a
> good
> > > > > > > direction,
> > > > > > > > > that supports more features (GPU, FPGA, etc.) with less
> > > coupling to
> > > > > > > the TM
> > > > > > > > > core (only needs to understand the general interfaces). The
> > > > > detailed
> > > > > > > > > implementation for specific resource types can even be
> > > encapsulated
> > > > > > as
> > > > > > > a
> > > > > > > > > library.
> > > > > > > > >
> > > > > > > > > @Becket
> > > > > > > > >
> > > > > > > > > Thanks for sharing your thought on the final state.
> Despite the
> > > > > > > details how
> > > > > > > > > the interfaces should look like, I think this is a really
> good
> > > > > > > abstraction
> > > > > > > > > for supporting general resource types.
> > > > > > > > >
> > > > > > > > > I'd like to further clarify that, the following three
> things
> > > are
> > > > > all
> > > > > > > that
> > > > > > > > > the "Flink core" needs to understand.
> > > > > > > > >
> > > > > > > > >    - The *amount* of resource, for scheduling. Actually, we
> > > already
> > > > > > > have
> > > > > > > > >    the Resource class in ResourceProfile and ResourceSpec
> for
> > > > > > extended
> > > > > > > > >    resource. It's just not really used.
> > > > > > > > >    - The *info*, that Flink provides to the operators /
> user
> > > codes.
> > > > > > > > >    - The *provider*, which generates the info based on the
> > > amount.
> > > > > > > > >
> > > > > > > > > The "core" does not need to understand the specific
> > > implementation
> > > > > > > details
> > > > > > > > > of the above three. They can even be implemented in a
> 3rd-party
> > > > > > > library.
> > > > > > > > > Similar to how we allow users to define their custom
> > > > > MetricReporter.
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Mar 23, 2020 at 8:45 AM Becket Qin <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for the comment, Stephan.
> > > > > > > > > >
> > > > > > > > > >   - If everything becomes a "core feature", it will make
> the
> > > > > > project
> > > > > > > hard
> > > > > > > > > > > to develop in the future. Thinking "library" /
> "plugin" /
> > > > > > > "extension"
> > > > > > > > > > style
> > > > > > > > > > > where possible helps.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Completely agree. It is much more important to design a
> > > mechanism
> > > > > > > than
> > > > > > > > > > focusing on a specific case. Here is what I am thinking
> to
> > > fully
> > > > > > > support
> > > > > > > > > > custom resource management:
> > > > > > > > > > 1. On the JM / RM side, use ResourceProfile and
> ResourceSpec
> > > to
> > > > > > > define
> > > > > > > > > the
> > > > > > > > > > resource and the amount required. They will be used to
> find
> > > > > > suitable
> > > > > > > TMs
> > > > > > > > > > slots to run the tasks. At this point, the resources are
> only
> > > > > > > measured by
> > > > > > > > > > amount, i.e. they do not have individual ID.
> > > > > > > > > >
> > > > > > > > > > 2. On the TM side, have something like
> > > *"ResourceInfoProvider"*
> > > > > to
> > > > > > > > > identify
> > > > > > > > > > and provides the detail information of the individual
> > > resource,
> > > > > > e.g.
> > > > > > > GPU
> > > > > > > > > > ID.. It is important because the operator may have to
> > > explicitly
> > > > > > > interact
> > > > > > > > > > with the physical resource it uses. The
> ResourceInfoProvider
> > > > > might
> > > > > > > look
> > > > > > > > > > like something below.
> > > > > > > > > > interface ResourceInfoProvider<INFO> {
> > > > > > > > > >     Map<AbstractID, INFO> retrieveResourceInfo(OperatorId
> > > opId,
> > > > > > > > > > ResourceProfile resourceProfile);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > - There could be several "*ResourceInfoProvider*"
> configured
> > > on
> > > > > the
> > > > > > > TM to
> > > > > > > > > > retrieve the information for different resources.
> > > > > > > > > > - The TM will be responsible to assign those individual
> > > resources
> > > > > > to
> > > > > > > each
> > > > > > > > > > operator according to their requested amount.
> > > > > > > > > > - The operators will be able to get the ResourceInfo from
> > > their
> > > > > > > > > > RuntimeContext.
> > > > > > > > > >
> > > > > > > > > > If we agree this is a reasonable final state. We can
> adapt
> > > the
> > > > > > > current
> > > > > > > > > FLIP
> > > > > > > > > > to it. In fact it does not sound a big change to me. All
> the
> > > > > > proposed
> > > > > > > > > > configuration can be as is, it is just that Flink itself
> > > won't
> > > > > care
> > > > > > > about
> > > > > > > > > > them, instead a GPUInfoProviver implementing the
> > > > > > ResourceInfoProvider
> > > > > > > > > will
> > > > > > > > > > use them.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 23, 2020 at 1:47 AM Stephan Ewen <
> > > [email protected]>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi all!
> > > > > > > > > > >
> > > > > > > > > > > The main point I wanted to throw into the discussion
> is the
> > > > > > > following:
> > > > > > > > > > >   - With more and more use cases, more and more tools
> go
> > > into
> > > > > > Flink
> > > > > > > > > > >   - If everything becomes a "core feature", it will
> make
> > > the
> > > > > > > project
> > > > > > > > > hard
> > > > > > > > > > > to develop in the future. Thinking "library" /
> "plugin" /
> > > > > > > "extension"
> > > > > > > > > > style
> > > > > > > > > > > where possible helps.
> > > > > > > > > > >
> > > > > > > > > > >   - A good thought experiment is always: How many
> future
> > > > > > developers
> > > > > > > > > have
> > > > > > > > > > to
> > > > > > > > > > > interact with this code (and possibly understand it
> > > partially),
> > > > > > > even if
> > > > > > > > > > the
> > > > > > > > > > > features they touch have nothing to do with GPU
> support. If
> > > > > many
> > > > > > > > > > > contributors to unrelated features will have to touch
> it
> > > and
> > > > > > > understand
> > > > > > > > > > it,
> > > > > > > > > > > then let's think if there is a different solution.
> Maybe
> > > there
> > > > > is
> > > > > > > not,
> > > > > > > > > > but
> > > > > > > > > > > then we should be sure why.
> > > > > > > > > > >
> > > > > > > > > > >   - That led me to raising this issue: If the GPU
> manager
> > > > > > becomes a
> > > > > > > > > core
> > > > > > > > > > > service in the TaskManager, Environment,
> RuntimeContext,
> > > etc.
> > > > > > then
> > > > > > > > > > everyone
> > > > > > > > > > > developing TM and streaming tasks need to understand
> the
> > > GPU
> > > > > > > manager.
> > > > > > > > > > That
> > > > > > > > > > > seems oddly specific, is my impression.
> > > > > > > > > > >
> > > > > > > > > > > Access to configuration seems not the right reason to
> do
> > > that.
> > > > > We
> > > > > > > > > should
> > > > > > > > > > > expose the Flink configuration from the RuntimeContext
> > > anyways.
> > > > > > > > > > >
> > > > > > > > > > > If GPUs are sliced and assigned during scheduling,
> there
> > > may be
> > > > > > > reason,
> > > > > > > > > > > although it looks that it would belong to the slot
> then. Is
> > > > > that
> > > > > > > what
> > > > > > > > > we
> > > > > > > > > > > are doing here?
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Stephan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Mar 20, 2020 at 2:58 AM Xintong Song <
> > > > > > > [email protected]>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >  Thanks for the feedback, Becket.
> > > > > > > > > > > >
> > > > > > > > > > > > IMO, eventually an operator should only see info of
> GPUs
> > > that
> > > > > > are
> > > > > > > > > > > dedicated
> > > > > > > > > > > > for it, instead of all GPUs on the machine/container
> in
> > > the
> > > > > > > current
> > > > > > > > > > > design.
> > > > > > > > > > > > It does not make sense to let the user who writes a
> UDF
> > > to
> > > > > > worry
> > > > > > > > > about
> > > > > > > > > > > > coordination among multiple operators running on the
> same
> > > > > > > machine.
> > > > > > > > > And
> > > > > > > > > > if
> > > > > > > > > > > > we want to limit the GPU info an operator sees, we
> > > should not
> > > > > > > let the
> > > > > > > > > > > > operator to instantiate GPUManager, which means we
> have
> > > to
> > > > > > expose
> > > > > > > > > > > something
> > > > > > > > > > > > through runtime context, either GPU info or some
> kind of
> > > > > > limited
> > > > > > > > > access
> > > > > > > > > > > to
> > > > > > > > > > > > the GPUManager.
> > > > > > > > > > > >
> > > > > > > > > > > > Thank you~
> > > > > > > > > > > >
> > > > > > > > > > > > Xintong Song
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 19, 2020 at 5:48 PM Becket Qin <
> > > > > > [email protected]
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > It probably make sense for us to first agree on the
> > > final
> > > > > > > state.
> > > > > > > > > More
> > > > > > > > > > > > > specifically, will the resource info be exposed
> through
> > > > > > runtime
> > > > > > > > > > context
> > > > > > > > > > > > > eventually?
> > > > > > > > > > > > >
> > > > > > > > > > > > > If that is the final state and we have a seamless
> > > migration
> > > > > > > story
> > > > > > > > > > from
> > > > > > > > > > > > this
> > > > > > > > > > > > > FLIP to that final state, Personally I think it is
> OK
> > > to
> > > > > > > expose the
> > > > > > > > > > GPU
> > > > > > > > > > > > > info in the runtime context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 16, 2020 at 11:21 AM Xintong Song <
> > > > > > > > > [email protected]
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > @Yangze,
> > > > > > > > > > > > > > I think what Stephan means (@Stephan, please
> correct
> > > me
> > > > > if
> > > > > > > I'm
> > > > > > > > > > wrong)
> > > > > > > > > > > > is
> > > > > > > > > > > > > > that, we might not need to hold and maintain the
> > > > > GPUManager
> > > > > > > as a
> > > > > > > > > > > > service
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > TaskManagerServices or RuntimeContext. An
> > > alternative is
> > > > > to
> > > > > > > > > create
> > > > > > > > > > /
> > > > > > > > > > > > > > retrieve the GPUManager only in the operators
> that
> > > need
> > > > > it,
> > > > > > > e.g.,
> > > > > > > > > > > with
> > > > > > > > > > > > a
> > > > > > > > > > > > > > static method `GPUManager.get()`.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @Stephan,
> > > > > > > > > > > > > > I agree with you on excluding GPUManager from
> > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - For the first step, where we provide unified
> > > > > TM-level
> > > > > > > GPU
> > > > > > > > > > > > > information
> > > > > > > > > > > > > >    to all operators, it should be fine to have
> > > operators
> > > > > > > access /
> > > > > > > > > > > > > >    lazy-initiate GPUManager by themselves.
> > > > > > > > > > > > > >    - In future, we might have some more
> fine-grained
> > > GPU
> > > > > > > > > > management,
> > > > > > > > > > > > > where
> > > > > > > > > > > > > >    we need to maintain GPUManager as a service
> and
> > > put
> > > > > GPU
> > > > > > > info
> > > > > > > > > in
> > > > > > > > > > > slot
> > > > > > > > > > > > > >    profiles. But at least for now it's not
> necessary
> > > to
> > > > > > > introduce
> > > > > > > > > > > such
> > > > > > > > > > > > > >    complexity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > However, I have some concerns on excluding
> GPUManager
> > > > > from
> > > > > > > > > > > > RuntimeContext
> > > > > > > > > > > > > > and let operators access it directly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >    - Configurations needed for creating the
> > > GPUManager is
> > > > > > not
> > > > > > > > > > always
> > > > > > > > > > > > > >    available for operators.
> > > > > > > > > > > > > >    - If later we want to have fine-grained
> control
> > > over
> > > > > GPU
> > > > > > > > > (e.g.,
> > > > > > > > > > > > > >    operators in each slot can only see GPUs
> reserved
> > > for
> > > > > > that
> > > > > > > > > > slot),
> > > > > > > > > > > > the
> > > > > > > > > > > > > >    approach cannot be easily extended.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would suggest to wrap the GPUManager behind
> > > > > > RuntimeContext
> > > > > > > and
> > > > > > > > > > only
> > > > > > > > > > > > > > expose the GPUInfo to users. For now, we can
> declare
> > > a
> > > > > > method
> > > > > > > > > > > > > > `getGPUInfo()` in RuntimeContext, with a default
> > > > > definition
> > > > > > > that
> > > > > > > > > > > calls
> > > > > > > > > > > > > > `GPUManager.get()` to get the lazily-created
> > > GPUManager.
> > > > > If
> > > > > > > later
> > > > > > > > > > we
> > > > > > > > > > > > want
> > > > > > > > > > > > > > to create / retrieve GPUManager in a different
> way,
> > > we
> > > > > can
> > > > > > > simply
> > > > > > > > > > > > change
> > > > > > > > > > > > > > how `getGPUInfo` is implemented, without needing
> to
> > > > > change
> > > > > > > any
> > > > > > > > > > public
> > > > > > > > > > > > > > interfaces.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 10:09 AM Yangze Guo <
> > > > > > > [email protected]>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @Shephan
> > > > > > > > > > > > > > > Do you mean Minicluster? Yes, it makes sense to
> > > share
> > > > > the
> > > > > > > GPU
> > > > > > > > > > > Manager
> > > > > > > > > > > > > > > in such scenario.
> > > > > > > > > > > > > > > If that's what you worry about, I'm +1 for
> holding
> > > > > > > > > > > > > > > GPUManager(ExternalResourceManagers) in
> > > TaskExecutor
> > > > > > > instead of
> > > > > > > > > > > > > > > TaskManagerServices.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regarding the RuntimeContext/FunctionContext,
> it
> > > just
> > > > > > > holds the
> > > > > > > > > > GPU
> > > > > > > > > > > > > > > info instead of the GPU Manager. AFAIK, it's
> the
> > > only
> > > > > > > place we
> > > > > > > > > > > could
> > > > > > > > > > > > > > > pass GPU info to the
> > > RichFunction/UserDefinedFunction.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Sat, Mar 14, 2020 at 4:06 AM Isaac Godfried
> <
> > > > > > > > > > > [email protected]
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---- On Fri, 13 Mar 2020 15:58:20 +0000
> > > > > > [email protected]
> > > > > > > > > wrote
> > > > > > > > > > > > ----
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can we somehow keep this out of the
> > > TaskManager
> > > > > > > services
> > > > > > > > > > > > > > > > > I fear that we could not. IMO, the
> > > GPUManager(or
> > > > > > > > > > > > > > > > > ExternalServicesManagers in future) is
> > > conceptually
> > > > > > > one of
> > > > > > > > > > the
> > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > manager services, just like MemoryManager
> > > before
> > > > > > 1.10.
> > > > > > > > > > > > > > > > > - It maintains/holds the GPU resource at TM
> > > level
> > > > > and
> > > > > > > all
> > > > > > > > > of
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > operators allocate the GPU resources from
> it.
> > > So,
> > > > > it
> > > > > > > should
> > > > > > > > > > be
> > > > > > > > > > > > > > > > > exclusive to a single TaskExecutor.
> > > > > > > > > > > > > > > > > - We could add a collection called
> > > > > > > ExternalResourceManagers
> > > > > > > > > > to
> > > > > > > > > > > > hold
> > > > > > > > > > > > > > > > > all managers of other external resources
> in the
> > > > > > future.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Can you help me understand why this needs the
> > > > > addition
> > > > > > in
> > > > > > > > > > > > > > > TaskMagerServices
> > > > > > > > > > > > > > > > or in the RuntimeContext?
> > > > > > > > > > > > > > > > Are you worried about the case when multiple
> Task
> > > > > > > Executors
> > > > > > > > > run
> > > > > > > > > > > in
> > > > > > > > > > > > > the
> > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > JVM? That's not common, but wouldn't it
> actually
> > > be
> > > > > > good
> > > > > > > in
> > > > > > > > > > that
> > > > > > > > > > > > case
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > share the GPU Manager, given that the GPU is
> > > shared?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---------------------------
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > What parts need information about this?
> > > > > > > > > > > > > > > > > In this FLIP, operators need the
> information.
> > > Thus,
> > > > > > we
> > > > > > > > > expose
> > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > information to the
> > > RuntimeContext/FunctionContext.
> > > > > > The
> > > > > > > slot
> > > > > > > > > > > > profile
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > not aware of GPU resources as GPU is TM
> level
> > > > > > resource
> > > > > > > now.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Can the GPU Manager be a "self contained"
> > > thing
> > > > > > that
> > > > > > > > > simply
> > > > > > > > > > > > takes
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > configuration, and then abstracts
> everything
> > > > > > > internally?
> > > > > > > > > > > > > > > > > Yes, we just pass the path/args of the
> discover
> > > > > > script
> > > > > > > and
> > > > > > > > > > how
> > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > GPUs per TM to it. It takes the
> responsibility
> > > to
> > > > > get
> > > > > > > the
> > > > > > > > > GPU
> > > > > > > > > > > > > > > > > information and expose them to the
> > > > > > > > > > > RuntimeContext/FunctionContext
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > Operators. Meanwhile, we'd better not allow
> > > > > operators
> > > > > > > to
> > > > > > > > > > > directly
> > > > > > > > > > > > > > > > > access GPUManager, it should get what they
> want
> > > > > from
> > > > > > > > > Context.
> > > > > > > > > > > We
> > > > > > > > > > > > > > could
> > > > > > > > > > > > > > > > > then decouple the interface/implementation
> of
> > > > > > > GPUManager
> > > > > > > > > and
> > > > > > > > > > > > Public
> > > > > > > > > > > > > > > > > API.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Fri, Mar 13, 2020 at 7:26 PM Stephan
> Ewen <
> > > > > > > > > > [email protected]
> > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > It sounds fine to initially start with
> GPU
> > > > > specific
> > > > > > > > > support
> > > > > > > > > > > and
> > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > generalizing this once we better
> understand
> > > the
> > > > > > > space.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > About the implementation suggested in
> > > FLIP-108:
> > > > > > > > > > > > > > > > > > - Can we somehow keep this out of the
> > > TaskManager
> > > > > > > > > services?
> > > > > > > > > > > > > > Anything
> > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > have to pull through all layers of the TM
> > > makes
> > > > > the
> > > > > > > TM
> > > > > > > > > > > > components
> > > > > > > > > > > > > > yet
> > > > > > > > > > > > > > > > > more
> > > > > > > > > > > > > > > > > > complex and harder to maintain.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > - What parts need information about this?
> > > > > > > > > > > > > > > > > > -> do the slot profiles need information
> > > about
> > > > > the
> > > > > > > GPU?
> > > > > > > > > > > > > > > > > > -> Can the GPU Manager be a "self
> contained"
> > > > > thing
> > > > > > > that
> > > > > > > > > > > simply
> > > > > > > > > > > > > > takes
> > > > > > > > > > > > > > > > > > the configuration, and then abstracts
> > > everything
> > > > > > > > > > internally?
> > > > > > > > > > > > > > > Operators
> > > > > > > > > > > > > > > > > can
> > > > > > > > > > > > > > > > > > access it via "GPUManager.get()" or so?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 4:19 AM Yangze
> Guo <
> > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for all the feedbacks.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Becket
> > > > > > > > > > > > > > > > > > > Regarding the WebUI and GPUInfo, you're
> > > right,
> > > > > > > I'll add
> > > > > > > > > > > them
> > > > > > > > > > > > to
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > Public API section.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Stephan @Becket
> > > > > > > > > > > > > > > > > > > Regarding the general extended resource
> > > > > > mechanism,
> > > > > > > I
> > > > > > > > > > second
> > > > > > > > > > > > > > > Xintong's
> > > > > > > > > > > > > > > > > > > suggestion.
> > > > > > > > > > > > > > > > > > > - It's better to leverage
> ResourceProfile
> > > and
> > > > > > > > > > ResourceSpec
> > > > > > > > > > > > > after
> > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > supporting fine-grained GPU
> scheduling. As
> > > a
> > > > > > first
> > > > > > > step
> > > > > > > > > > > > > > proposal, I
> > > > > > > > > > > > > > > > > > > prefer to not include it in the scope
> of
> > > this
> > > > > > FLIP.
> > > > > > > > > > > > > > > > > > > - Regarding the "Extended Resource
> > > Manager",
> > > > > if I
> > > > > > > > > > > understand
> > > > > > > > > > > > > > > > > > > correctly, it just a code refactoring
> atm,
> > > we
> > > > > > could
> > > > > > > > > > extract
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > open/close/allocateExtendResources of
> > > > > GPUManager
> > > > > > to
> > > > > > > > > that
> > > > > > > > > > > > > > > interface. If
> > > > > > > > > > > > > > > > > > > that is the case, +1 to do it during
> > > > > > > implementation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > @Xingbo
> > > > > > > > > > > > > > > > > > > As Xintong said, we looked into how
> Spark
> > > > > > supports
> > > > > > > a
> > > > > > > > > > > general
> > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > Resource Scheduling" before and
> decided to
> > > > > > > introduce a
> > > > > > > > > > > common
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > configuration
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > >
> > > schema(taskmanager.resource.{resourceName}.amount/discovery-script)
> > > > > > > > > > > > > > > > > > > to make it more extensible. I think the
> > > > > > "resource"
> > > > > > > is a
> > > > > > > > > > > > proper
> > > > > > > > > > > > > > > level
> > > > > > > > > > > > > > > > > > > to contain all the configs of extended
> > > > > resources.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 10:48 AM Xingbo
> > > Huang <
> > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks a lot for the FLIP, Yangze.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > There is no doubt that GPU resource
> > > > > management
> > > > > > > > > support
> > > > > > > > > > > will
> > > > > > > > > > > > > > > greatly
> > > > > > > > > > > > > > > > > > > > facilitate the development of
> AI-related
> > > > > > > applications
> > > > > > > > > > by
> > > > > > > > > > > > > > PyFlink
> > > > > > > > > > > > > > > > > users.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I have only one comment about this
> wiki:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Regarding the names of several GPU
> > > > > > > configurations, I
> > > > > > > > > > > think
> > > > > > > > > > > > it
> > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > better
> > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > delete the resource field makes it
> > > consistent
> > > > > > > with
> > > > > > > > > the
> > > > > > > > > > > > names
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > > resource-related configurations in
> > > > > > > TaskManagerOption.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > taskmanager.resource.gpu.discovery-script.path
> > > > > > > > > ->
> > > > > > > > > > > > > > > > > > > > taskmanager.gpu.discovery-script.path
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xingbo
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Xintong Song <[email protected]>
> > > > > > > 于2020年3月4日周三
> > > > > > > > > > > > 上午10:39写道：
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > @Stephan, @Becket,
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Actually, Yangze, Yang and I also
> had
> > > an
> > > > > > > offline
> > > > > > > > > > > > discussion
> > > > > > > > > > > > > > > about
> > > > > > > > > > > > > > > > > > > making
> > > > > > > > > > > > > > > > > > > > > the "GPU Support" as some general
> > > "Extended
> > > > > > > > > Resource
> > > > > > > > > > > > > > Support".
> > > > > > > > > > > > > > > We
> > > > > > > > > > > > > > > > > > > believe
> > > > > > > > > > > > > > > > > > > > > supporting extended resources in a
> > > general
> > > > > > > > > mechanism
> > > > > > > > > > is
> > > > > > > > > > > > > > > definitely
> > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > good
> > > > > > > > > > > > > > > > > > > > > and extensible way. The reason we
> > > propose
> > > > > > this
> > > > > > > FLIP
> > > > > > > > > > > > > narrowing
> > > > > > > > > > > > > > > its
> > > > > > > > > > > > > > > > > scope
> > > > > > > > > > > > > > > > > > > > > down to GPU alone, is mainly for
> the
> > > > > concern
> > > > > > on
> > > > > > > > > extra
> > > > > > > > > > > > > efforts
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > review
> > > > > > > > > > > > > > > > > > > > > capacity needed for a general
> > > mechanism.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > To come up with a well design on a
> > > general
> > > > > > > extended
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > > > mechanism, we would need to
> investigate
> > > > > more
> > > > > > > on how
> > > > > > > > > > > > people
> > > > > > > > > > > > > > use
> > > > > > > > > > > > > > > > > > > different
> > > > > > > > > > > > > > > > > > > > > kind of resources in practice. For
> > > GPU, we
> > > > > > > learnt
> > > > > > > > > > such
> > > > > > > > > > > > > > > knowledge
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > experts, Becket and his team
> members.
> > > But
> > > > > for
> > > > > > > FPGA,
> > > > > > > > > > or
> > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > potential
> > > > > > > > > > > > > > > > > > > > > extended resources, we don't have
> such
> > > > > > > convenient
> > > > > > > > > > > > > information
> > > > > > > > > > > > > > > > > sources,
> > > > > > > > > > > > > > > > > > > > > making the investigation requires
> more
> > > > > > efforts,
> > > > > > > > > > which I
> > > > > > > > > > > > > tend
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > not necessary atm.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On the other hand, we also looked
> into
> > > how
> > > > > > > Spark
> > > > > > > > > > > > supports a
> > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > "Custom
> > > > > > > > > > > > > > > > > > > > > Resource Scheduling". Assuming we
> want
> > > to
> > > > > > have
> > > > > > > a
> > > > > > > > > > > similar
> > > > > > > > > > > > > > > general
> > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > resource mechanism in the future,
> we
> > > > > believe
> > > > > > > that
> > > > > > > > > the
> > > > > > > > > > > > > current
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > > > > > design can be easily extended, in
> an
> > > > > > > incremental
> > > > > > > > > way
> > > > > > > > > > > > > without
> > > > > > > > > > > > > > > too
> > > > > > > > > > > > > > > > > many
> > > > > > > > > > > > > > > > > > > > > reworks.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > - The most important part is
> probably
> > > user
> > > > > > > > > > interfaces.
> > > > > > > > > > > > > Spark
> > > > > > > > > > > > > > > > > offers
> > > > > > > > > > > > > > > > > > > > > configuration options to define the
> > > amount,
> > > > > > > > > discovery
> > > > > > > > > > > > > script
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > vendor
> > > > > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > > > > > > > k8s) in a per resource type bias
> [1],
> > > which
> > > > > > is
> > > > > > > very
> > > > > > > > > > > > similar
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > what
> > > > > > > > > > > > > > > > > > > we
> > > > > > > > > > > > > > > > > > > > > proposed in this FLIP. I think
> it's not
> > > > > > > necessary
> > > > > > > > > to
> > > > > > > > > > > > expose
> > > > > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > > > in the general way atm, since we
> do not
> > > > > have
> > > > > > > > > supports
> > > > > > > > > > > for
> > > > > > > > > > > > > > other
> > > > > > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > > > types now. If later we decided to
> have
> > > per
> > > > > > > resource
> > > > > > > > > > > type
> > > > > > > > > > > > > > config
> > > > > > > > > > > > > > > > > > > > > options, we
> > > > > > > > > > > > > > > > > > > > > can have backwards compatibility
> on the
> > > > > > current
> > > > > > > > > > > proposed
> > > > > > > > > > > > > > > options
> > > > > > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > > > > > > > simple key mapping.
> > > > > > > > > > > > > > > > > > > > > - For the GPU Manager, if later
> needed
> > > we
> > > > > can
> > > > > > > > > change
> > > > > > > > > > it
> > > > > > > > > > > > to
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > "Extended
> > > > > > > > > > > > > > > > > > > > > Resource Manager" (or whatever it
> is
> > > > > called).
> > > > > > > That
> > > > > > > > > > > should
> > > > > > > > > > > > > be
> > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > pure
> > > > > > > > > > > > > > > > > > > > > component-internal refactoring.
> > > > > > > > > > > > > > > > > > > > > - For ResourceProfile and
> ResourceSpec,
> > > > > there
> > > > > > > are
> > > > > > > > > > > already
> > > > > > > > > > > > > > > > > fields for
> > > > > > > > > > > > > > > > > > > > > general extended resource. We can
> of
> > > course
> > > > > > > > > leverage
> > > > > > > > > > > them
> > > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > > > supporting
> > > > > > > > > > > > > > > > > > > > > fine grained GPU scheduling. That
> is
> > > also
> > > > > not
> > > > > > > in
> > > > > > > > > the
> > > > > > > > > > > > scope
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > first
> > > > > > > > > > > > > > > > > > > > > step proposal, and would require
> > > FLIP-56 to
> > > > > > be
> > > > > > > > > > finished
> > > > > > > > > > > > > > first.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > To summary up, I agree with Becket
> that
> > > > > have
> > > > > > a
> > > > > > > > > > separate
> > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > general extended resource
> mechanism,
> > > and
> > > > > keep
> > > > > > > it in
> > > > > > > > > > > mind
> > > > > > > > > > > > > when
> > > > > > > > > > > > > > > > > > > discussing
> > > > > > > > > > > > > > > > > > > > > and implementing the current one.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://spark.apache.org/docs/3.0.0-preview/configuration.html#custom-resource-scheduling-and-configuration-overview
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 9:18 AM
> Becket
> > > Qin <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > That's a good point, Stephan. It
> > > makes
> > > > > > total
> > > > > > > > > sense
> > > > > > > > > > to
> > > > > > > > > > > > > > > generalize
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > resource management to support
> custom
> > > > > > > resources.
> > > > > > > > > > > Having
> > > > > > > > > > > > > > that
> > > > > > > > > > > > > > > > > allows
> > > > > > > > > > > > > > > > > > > users
> > > > > > > > > > > > > > > > > > > > > > to add new resources by
> themselves.
> > > The
> > > > > > > general
> > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > involve two different aspects:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 1. The custom resource type
> > > definition.
> > > > > It
> > > > > > is
> > > > > > > > > > > supported
> > > > > > > > > > > > > by
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > extended
> > > > > > > > > > > > > > > > > > > > > > resources in ResourceProfile and
> > > > > > > ResourceSpec.
> > > > > > > > > This
> > > > > > > > > > > > will
> > > > > > > > > > > > > > > likely
> > > > > > > > > > > > > > > > > cover
> > > > > > > > > > > > > > > > > > > > > > majority of the cases.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > 2. The custom resource allocation
> > > logic,
> > > > > > > i.e. how
> > > > > > > > > > to
> > > > > > > > > > > > > assign
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > resources
> > > > > > > > > > > > > > > > > > > > > > to different tasks, operators,
> and
> > > so on.
> > > > > > > This
> > > > > > > > > may
> > > > > > > > > > > > > require
> > > > > > > > > > > > > > > two
> > > > > > > > > > > > > > > > > > > levels /
> > > > > > > > > > > > > > > > > > > > > > steps:
> > > > > > > > > > > > > > > > > > > > > > a. Subtask level - make sure the
> > > subtasks
> > > > > > > are put
> > > > > > > > > > > into
> > > > > > > > > > > > > > > > > suitable
> > > > > > > > > > > > > > > > > > > > > slots.
> > > > > > > > > > > > > > > > > > > > > > It is done by the global RM and
> is
> > > not
> > > > > > > > > customizable
> > > > > > > > > > > > right
> > > > > > > > > > > > > > > now.
> > > > > > > > > > > > > > > > > > > > > > b. Operator level - map the exact
> > > > > resource
> > > > > > > to the
> > > > > > > > > > > > > operators
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > TM.
> > > > > > > > > > > > > > > > > > > > > e.g.
> > > > > > > > > > > > > > > > > > > > > > GPU 1 for operator A, GPU 2 for
> > > operator
> > > > > B.
> > > > > > > This
> > > > > > > > > > step
> > > > > > > > > > > > is
> > > > > > > > > > > > > > > needed
> > > > > > > > > > > > > > > > > > > assuming
> > > > > > > > > > > > > > > > > > > > > > the global RM does not
> distinguish
> > > > > > individual
> > > > > > > > > > > resources
> > > > > > > > > > > > > of
> > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > same
> > > > > > > > > > > > > > > > > > > type.
> > > > > > > > > > > > > > > > > > > > > > It is true for memory, but not
> for
> > > GPU.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > The GPU manager is designed to
> do 2.b
> > > > > here.
> > > > > > > So it
> > > > > > > > > > > > should
> > > > > > > > > > > > > > > > > discover the
> > > > > > > > > > > > > > > > > > > > > > physical GPU information and
> > > bind/match
> > > > > > them
> > > > > > > to
> > > > > > > > > > each
> > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > Making
> > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > general will fill in the missing
> > > piece to
> > > > > > > support
> > > > > > > > > > > > custom
> > > > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > > > type
> > > > > > > > > > > > > > > > > > > > > > definition. But I'd avoid
> calling it
> > > a
> > > > > > > "External
> > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Manager" to
> > > > > > > > > > > > > > > > > > > > > avoid
> > > > > > > > > > > > > > > > > > > > > > confusion with RM, maybe
> something
> > > like
> > > > > > > "Operator
> > > > > > > > > > > > > Resource
> > > > > > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > > > > > > > would
> > > > > > > > > > > > > > > > > > > > > > be more accurate. So for each
> > > resource
> > > > > type
> > > > > > > users
> > > > > > > > > > can
> > > > > > > > > > > > > have
> > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > optional
> > > > > > > > > > > > > > > > > > > > > > "Operator Resource Assigner" in
> the
> > > TM.
> > > > > For
> > > > > > > > > memory,
> > > > > > > > > > > > users
> > > > > > > > > > > > > > > don't
> > > > > > > > > > > > > > > > > need
> > > > > > > > > > > > > > > > > > > > > this,
> > > > > > > > > > > > > > > > > > > > > > but for other extended resources,
> > > users
> > > > > may
> > > > > > > need
> > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Personally I think a pluggable
> > > "Operator
> > > > > > > Resource
> > > > > > > > > > > > > Assigner"
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > achievable
> > > > > > > > > > > > > > > > > > > > > > in this FLIP. But I am also OK
> with
> > > > > having
> > > > > > > that
> > > > > > > > > in
> > > > > > > > > > a
> > > > > > > > > > > > > > separate
> > > > > > > > > > > > > > > > > FLIP
> > > > > > > > > > > > > > > > > > > > > because
> > > > > > > > > > > > > > > > > > > > > > the interface between the
> "Operator
> > > > > > Resource
> > > > > > > > > > > Assigner"
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > operator
> > > > > > > > > > > > > > > > > > > may
> > > > > > > > > > > > > > > > > > > > > > take a while to settle down if we
> > > want to
> > > > > > > make it
> > > > > > > > > > > > > generic.
> > > > > > > > > > > > > > > But I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > our
> > > > > > > > > > > > > > > > > > > > > > implementation should take this
> > > future
> > > > > work
> > > > > > > into
> > > > > > > > > > > > > > > consideration so
> > > > > > > > > > > > > > > > > > > that we
> > > > > > > > > > > > > > > > > > > > > > don't need to break backwards
> > > > > compatibility
> > > > > > > once
> > > > > > > > > we
> > > > > > > > > > > > have
> > > > > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 4, 2020 at 12:27 AM
> > > Stephan
> > > > > > Ewen
> > > > > > > <
> > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Thank you for writing this
> FLIP.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > I cannot really give much input
> > > into
> > > > > the
> > > > > > > > > > mechanics
> > > > > > > > > > > of
> > > > > > > > > > > > > > > GPU-aware
> > > > > > > > > > > > > > > > > > > > > > scheduling
> > > > > > > > > > > > > > > > > > > > > > > and GPU allocation, as I have
> no
> > > > > > experience
> > > > > > > > > with
> > > > > > > > > > > > that.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > One thought I had when reading
> the
> > > > > > > proposal is
> > > > > > > > > if
> > > > > > > > > > > it
> > > > > > > > > > > > > > makes
> > > > > > > > > > > > > > > > > sense to
> > > > > > > > > > > > > > > > > > > > > look
> > > > > > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > > > > > > the "GPU Manager" as an
> "External
> > > > > > Resource
> > > > > > > > > > > Manager",
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > is one
> > > > > > > > > > > > > > > > > > > > > such
> > > > > > > > > > > > > > > > > > > > > > > resource.
> > > > > > > > > > > > > > > > > > > > > > > The way I understand the
> > > > > ResourceProfile
> > > > > > > and
> > > > > > > > > > > > > > ResourceSpec,
> > > > > > > > > > > > > > > > > that is
> > > > > > > > > > > > > > > > > > > how
> > > > > > > > > > > > > > > > > > > > > it
> > > > > > > > > > > > > > > > > > > > > > > is done there.
> > > > > > > > > > > > > > > > > > > > > > > It has the advantage that it
> looks
> > > more
> > > > > > > > > > extensible.
> > > > > > > > > > > > > Maybe
> > > > > > > > > > > > > > > > > there is
> > > > > > > > > > > > > > > > > > > a
> > > > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > Resource, a specialized NVIDIA
> GPU
> > > > > > > Resource,
> > > > > > > > > and
> > > > > > > > > > > FPGA
> > > > > > > > > > > > > > > > > Resource, a
> > > > > > > > > > > > > > > > > > > > > Alibaba
> > > > > > > > > > > > > > > > > > > > > > > TPU Resource, etc.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > Stephan
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 7:57 AM
> > > Becket
> > > > > > Qin <
> > > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks for the FLIP Yangze.
> GPU
> > > > > > resource
> > > > > > > > > > > management
> > > > > > > > > > > > > > > support
> > > > > > > > > > > > > > > > > is a
> > > > > > > > > > > > > > > > > > > > > > > must-have
> > > > > > > > > > > > > > > > > > > > > > > > for machine learning use
> cases.
> > > > > > Actually
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > one
> > > > > > > > > > > > of
> > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > mostly
> > > > > > > > > > > > > > > > > > > > > asked
> > > > > > > > > > > > > > > > > > > > > > > > question from the users who
> are
> > > > > > > interested in
> > > > > > > > > > > using
> > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > for ML.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Some quick comments /
> questions
> > > to
> > > > > the
> > > > > > > wiki.
> > > > > > > > > > > > > > > > > > > > > > > > 1. The WebUI / REST API
> should
> > > > > probably
> > > > > > > also
> > > > > > > > > be
> > > > > > > > > > > > > > > mentioned in
> > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > > > > > > > > interface section.
> > > > > > > > > > > > > > > > > > > > > > > > 2. Is the data structure that
> > > holds
> > > > > GPU
> > > > > > > info
> > > > > > > > > > > also a
> > > > > > > > > > > > > > > public
> > > > > > > > > > > > > > > > > API?
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > On Tue, Mar 3, 2020 at 10:15
> AM
> > > > > Xintong
> > > > > > > Song
> > > > > > > > > <
> > > > > > > > > > > > > > > > > > > [email protected]>
> > > > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thanks for drafting the
> FLIP
> > > and
> > > > > > > kicking
> > > > > > > > > off
> > > > > > > > > > > the
> > > > > > > > > > > > > > > > > discussion,
> > > > > > > > > > > > > > > > > > > > > Yangze.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Big +1 for this feature.
> > > Supporting
> > > > > > > using
> > > > > > > > > of
> > > > > > > > > > > GPU
> > > > > > > > > > > > in
> > > > > > > > > > > > > > > Flink
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > > > > > > > > > significant,
> > > > > > > > > > > > > > > > > > > > > > > > > especially for the ML
> > > scenarios.
> > > > > > > > > > > > > > > > > > > > > > > > > I've reviewed the FLIP wiki
> > > doc and
> > > > > > it
> > > > > > > > > looks
> > > > > > > > > > > good
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > me. I
> > > > > > > > > > > > > > > > > > > think
> > > > > > > > > > > > > > > > > > > > > > it's a
> > > > > > > > > > > > > > > > > > > > > > > > > very good first step for
> > > Flink's
> > > > > GPU
> > > > > > > > > > supports.
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Thank you~
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > Xintong Song
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Mar 2, 2020 at
> 12:06 PM
> > > > > > Yangze
> > > > > > > Guo
> > > > > > > > > <
> > > > > > > > > > > > > > > > > [email protected]
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Hi everyone,
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > We would like to start a
> > > > > discussion
> > > > > > > > > thread
> > > > > > > > > > on
> > > > > > > > > > > > > > > "FLIP-108:
> > > > > > > > > > > > > > > > > Add
> > > > > > > > > > > > > > > > > > > GPU
> > > > > > > > > > > > > > > > > > > > > > > > > > support in Flink"[1].
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > This FLIP mainly
> discusses
> > > the
> > > > > > > following
> > > > > > > > > > > > issues:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > - Enable user to
> configure
> > > how
> > > > > many
> > > > > > > GPUs
> > > > > > > > > > in a
> > > > > > > > > > > > > task
> > > > > > > > > > > > > > > > > executor
> > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > > > forward such
> requirements to
> > > the
> > > > > > > external
> > > > > > > > > > > > > resource
> > > > > > > > > > > > > > > > > managers
> > > > > > > > > > > > > > > > > > > (for
> > > > > > > > > > > > > > > > > > > > > > > > > > Kubernetes/Yarn/Mesos
> > > setups).
> > > > > > > > > > > > > > > > > > > > > > > > > > - Provide information of
> > > > > available
> > > > > > > GPU
> > > > > > > > > > > > resources
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > operators.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Key changes proposed in
> the
> > > FLIP
> > > > > > are
> > > > > > > as
> > > > > > > > > > > > follows:
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > - Forward GPU resource
> > > > > requirements
> > > > > > > to
> > > > > > > > > > > > > > > Yarn/Kubernetes.
> > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce GPUManager as
> > > one of
> > > > > > the
> > > > > > > task
> > > > > > > > > > > > manager
> > > > > > > > > > > > > > > > > services to
> > > > > > > > > > > > > > > > > > > > > > > discover
> > > > > > > > > > > > > > > > > > > > > > > > > > and expose GPU resource
> > > > > information
> > > > > > > to
> > > > > > > > > the
> > > > > > > > > > > > > context
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > > > > > > > - Introduce the default
> > > script
> > > > > for
> > > > > > > GPU
> > > > > > > > > > > > discovery,
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > which we
> > > > > > > > > > > > > > > > > > > > > > provide
> > > > > > > > > > > > > > > > > > > > > > > > > > the privilege mode to
> help
> > > user
> > > > > to
> > > > > > > > > achieve
> > > > > > > > > > > > > > > worker-level
> > > > > > > > > > > > > > > > > > > isolation
> > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > > > standalone mode.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Please find more details
> in
> > > the
> > > > > > FLIP
> > > > > > > wiki
> > > > > > > > > > > > > document
> > > > > > > > > > > > > > > [1].
> > > > > > > > > > > > > > > > > > > Looking
> > > > > > > > > > > > > > > > > > > > > > > forward
> > > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > > > your feedbacks.
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > [1]
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-108%3A+Add+GPU+support+in+Flink
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-108: Add GPU support in Flink

Reply via email to