Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management

Matthias Pohl Fri, 03 Feb 2023 04:18:17 -0800

Thanks David for creating this FLIP. It sounds promising and useful to
have. Here are some thoughts from my side (some of them might be rather a
follow-up and not necessarily part of this FLIP):
- I'm wondering whether it makes sense to add some kind of resource ID to
the REST API. This would give Flink a tool to verify the PATCH request of
the external system in a compare-and-set kind of manner. AFAIU, the process
requires the external system to retrieve the resource requirements first
(to retrieve the vertex IDs). A resource ID <ABC> would be sent along as a
unique identifier for the provided setup. It's essentially the version ID
of the currently deployed resource requirement configuration. Flink doesn't
know whether the external system would use the provided information in some
way to derive a new set of resource requirements for this job. The
subsequent PATCH request with updated resource requirements would include
the previously retrieved resource ID <ABC>. The PATCH call would fail if
there was a concurrent PATCH call in between indicating to the external
system that the resource requirements were concurrently updated.
- How often do we allow resource requirements to be changed? That question
might make my previous comment on the resource ID obsolete because we could
just make any PATCH call fail if there was a resource requirement update
within a certain time frame before the request. But such a time period is
something we might want to make configurable then, I guess.
- Versioning the JobGraph in the JobGraphStore rather than overwriting it
might be an idea. This would enable us to provide resource requirement
changes in the UI or through the REST API. It is related to a problem
around keeping track of the exception history within the AdaptiveScheduler
and also having to consider multiple versions of a JobGraph. But for that
one, we use the ExecutionGraphInfoStore right now.
- Updating the JobGraph in the JobGraphStore makes sense. I'm just
wondering whether we bundle two things together that are actually separate:
The business logic and the execution configuration (the resource
requirements). I'm aware that this is not a flaw of the current FLIP but
rather something that was not necessary to address in the past because the
JobGraph was kind of static. I don't remember whether that was already
discussed while working on the AdaptiveScheduler for FLIP-160 [1]. Maybe,
I'm missing some functionality here that requires us to have everything in
one place. But it feels like updating the entire JobGraph which could be
actually a "config change" is not reasonable. ...also considering the
amount of data that can be stored in a ConfigMap/ZooKeeper node if
versioning the resource requirement change as proposed in my previous item
is an option for us.
- Updating the JobGraphStore means adding more requests to the HA backend
API. There were some concerns shared in the discussion thread [2] for
FLIP-270 [3] on pressuring the k8s API server in the past with too many
calls. Eventhough, it's more likely to be caused by checkpointing, I still
wanted to bring it up. We're working on a standardized performance test to
prepare going forward with FLIP-270 [3] right now.


Best,
Matthias

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler
[2] https://lists.apache.org/thread/bm6rmxxk6fbrqfsgz71gvso58950d4mj
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-270%3A+Repeatable+Cleanup+of+Checkpoints

On Fri, Feb 3, 2023 at 10:31 AM ConradJam <[email protected]> wrote:

> Hi David:
>
> Thank you for drive this flip, which helps less flink shutdown time
>
> for this flip, I would like to make a few idea on share
>
>
>    - when the number of "slots" is insufficient, can we can stop users
>    rescaling or throw something to tell user "less avaliable slots to
> upgrade,
>    please checkout your alivalbe slots" ? Or we could have a request
>    switch(true/false) to allow this behavior
>
>
>    - when user upgrade job-vertx-parallelism . I want to have an interface
>    to query the current update parallel execution status, so that the user
> or
>    program can understand the current status
>    - I want to have an interface to query the current update parallelism
>    execution status. This also helps similar to *[1] Flink K8S Operator*
>    management
>
>
> {
>   status: Failed
>   reason: "less avaliable slots to upgrade, please checkout your alivalbe
> slots"
> }
>
>
>
>    - *Pending*: this job now is join the upgrade queue,it will be update
>    later
>    - *Rescaling*: job now is rescaling,wait it finish
>    - *Finished*: finish do it
>    - *Failed* : something have wrong,so this job is not alivable upgrade
>
> I want to supplement my above content in flip, what do you think ?
>
>
>    1.
>    https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/
>
>
> David Morávek <[email protected]> 于2023年2月3日周五 16:42写道：
>
> > Hi everyone,
> >
> > This FLIP [1] introduces a new REST API for declaring resource
> requirements
> > for the Adaptive Scheduler. There seems to be a clear need for this API
> > based on the discussion on the "Reworking the Rescale API" [2] thread.
> >
> > Before we get started, this work is heavily based on the prototype [3]
> > created by Till Rohrmann, and the FLIP is being published with his
> consent.
> > Big shoutout to him!
> >
> > Last and not least, thanks to Chesnay and Roman for the initial reviews
> and
> > discussions.
> >
> > The best start would be watching a short demo [4] that I've recorded,
> which
> > illustrates newly added capabilities (rescaling the running job, handing
> > back resources to the RM, and session cluster support).
> >
> > The intuition behind the FLIP is being able to define resource
> requirements
> > ("resource boundaries") externally that the AdaptiveScheduler can
> navigate
> > within. This is a building block for higher-level efforts such as an
> > external Autoscaler. The natural extension of this work would be to allow
> > to specify per-vertex ResourceProfiles.
> >
> > Looking forward to your thoughts; any feedback is appreciated!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> > [2] https://lists.apache.org/thread/2f7dgr88xtbmsohtr0f6wmsvw8sw04f5
> > [3] https://github.com/tillrohrmann/flink/tree/autoscaling
> > [4]
> https://drive.google.com/file/d/1Vp8W-7Zk_iKXPTAiBT-eLPmCMd_I57Ty/view
> >
> > Best,
> > D.
> >
>
>
> --
> Best
>
> ConradJam
>

Re: [DISCUSS] FLIP-291: Externalized Declarative Resource Management

Reply via email to