Thanks David for creating this FLIP. It sounds promising and useful to have. Here are some thoughts from my side (some of them might be rather a follow-up and not necessarily part of this FLIP): - I'm wondering whether it makes sense to add some kind of resource ID to the REST API. This would give Flink a tool to verify the PATCH request of the external system in a compare-and-set kind of manner. AFAIU, the process requires the external system to retrieve the resource requirements first (to retrieve the vertex IDs). A resource ID <ABC> would be sent along as a unique identifier for the provided setup. It's essentially the version ID of the currently deployed resource requirement configuration. Flink doesn't know whether the external system would use the provided information in some way to derive a new set of resource requirements for this job. The subsequent PATCH request with updated resource requirements would include the previously retrieved resource ID <ABC>. The PATCH call would fail if there was a concurrent PATCH call in between indicating to the external system that the resource requirements were concurrently updated. - How often do we allow resource requirements to be changed? That question might make my previous comment on the resource ID obsolete because we could just make any PATCH call fail if there was a resource requirement update within a certain time frame before the request. But such a time period is something we might want to make configurable then, I guess. - Versioning the JobGraph in the JobGraphStore rather than overwriting it might be an idea. This would enable us to provide resource requirement changes in the UI or through the REST API. It is related to a problem around keeping track of the exception history within the AdaptiveScheduler and also having to consider multiple versions of a JobGraph. But for that one, we use the ExecutionGraphInfoStore right now. - Updating the JobGraph in the JobGraphStore makes sense. I'm just wondering whether we bundle two things together that are actually separate: The business logic and the execution configuration (the resource requirements). I'm aware that this is not a flaw of the current FLIP but rather something that was not necessary to address in the past because the JobGraph was kind of static. I don't remember whether that was already discussed while working on the AdaptiveScheduler for FLIP-160 [1]. Maybe, I'm missing some functionality here that requires us to have everything in one place. But it feels like updating the entire JobGraph which could be actually a "config change" is not reasonable. ...also considering the amount of data that can be stored in a ConfigMap/ZooKeeper node if versioning the resource requirement change as proposed in my previous item is an option for us. - Updating the JobGraphStore means adding more requests to the HA backend API. There were some concerns shared in the discussion thread [2] for FLIP-270 [3] on pressuring the k8s API server in the past with too many calls. Eventhough, it's more likely to be caused by checkpointing, I still wanted to bring it up. We're working on a standardized performance test to prepare going forward with FLIP-270 [3] right now.
Best, Matthias [1] [2] [3] On Fri, Feb 3, 2023 at 10:31 AM ConradJam <> wrote: > Hi David: > > Thank you for drive this flip, which helps less flink shutdown time > > for this flip, I would like to make a few idea on share > > > - when the number of "slots" is insufficient, can we can stop users > rescaling or throw something to tell user "less avaliable slots to > upgrade, > please checkout your alivalbe slots" ? Or we could have a request > switch(true/false) to allow this behavior > > > - when user upgrade job-vertx-parallelism . I want to have an interface > to query the current update parallel execution status, so that the user > or > program can understand the current status > - I want to have an interface to query the current update parallelism > execution status. This also helps similar to *[1] Flink K8S Operator* > management > > > { > status: Failed > reason: "less avaliable slots to upgrade, please checkout your alivalbe > slots" > } > > > > - *Pending*: this job now is join the upgrade queue,it will be update > later > - *Rescaling*: job now is rescaling,wait it finish > - *Finished*: finish do it > - *Failed* : something have wrong,so this job is not alivable upgrade > > I want to supplement my above content in flip, what do you think ? > > > 1. > > > > David Morávek <> 于2023年2月3日周五 16:42写道: > > > Hi everyone, > > > > This FLIP [1] introduces a new REST API for declaring resource > requirements > > for the Adaptive Scheduler. There seems to be a clear need for this API > > based on the discussion on the "Reworking the Rescale API" [2] thread. > > > > Before we get started, this work is heavily based on the prototype [3] > > created by Till Rohrmann, and the FLIP is being published with his > consent. > > Big shoutout to him! > > > > Last and not least, thanks to Chesnay and Roman for the initial reviews > and > > discussions. > > > > The best start would be watching a short demo [4] that I've recorded, > which > > illustrates newly added capabilities (rescaling the running job, handing > > back resources to the RM, and session cluster support). > > > > The intuition behind the FLIP is being able to define resource > requirements > > ("resource boundaries") externally that the AdaptiveScheduler can > navigate > > within. This is a building block for higher-level efforts such as an > > external Autoscaler. The natural extension of this work would be to allow > > to specify per-vertex ResourceProfiles. > > > > Looking forward to your thoughts; any feedback is appreciated! > > > > [1] > > > > > > > [2] > > [3] > > [4] > > > > > Best, > > D. > > > > > -- > Best > > ConradJam >