I missed that the FLIP states: > Currently, even though we’d expose the lower bound for clarity and API > completeness, we won’t allow setting it to any other value than one until we > have full support throughout the stack.
How hard would it be to address this issue in the FLIP? There is not much value to offer setting a lower bound which won't be respected / throw an error when it is set. If we had support for a lower bound, we could enforce a resource contract externally via setting lowerBound == upperBound. That ties back to the Rescale API discussion we had. I want to better understand what the major concerns would be around allowing this. Just to outline how I imagine the logic to work: A) The resource constraints are already met => Nothing changes B) More resources available than required => Cancel the job, drop as many superfluous task managers as needed, restart the job C) Less resources available than required => Acquire new task managers, wait for them to register, cancel and restart the job I'm open to helping out with the implementation. -Max On Mon, Feb 13, 2023 at 7:45 PM Maximilian Michels <m...@apache.org> wrote: > > Based on further discussion I had with Chesnay on this PR [1], I think > jobs would currently go into a restarting state after the resource > requirements have changed. This wouldn't achieve what we had in mind, > i.e. sticking to the old resource requirements until enough slots are > available to fulfil the new resource requirements. So this may not be > 100% what we need but it could be extended to do what we want. > > -Max > > [1] https://github.com/apache/flink/pull/21908#discussion_r1104792362 > > On Mon, Feb 13, 2023 at 7:16 PM Maximilian Michels <m...@apache.org> wrote: > > > > Hi David, > > > > This is awesome! Great writeup and demo. This is pretty much what we > > need for the autoscaler as part of the Flink Kubernetes operator [1]. > > Scaling Flink jobs effectively is hard but fortunately we have solved > > the issue as part of the Flink Kubernetes operator. The only critical > > piece we are missing is a better way to execute scaling decisions, as > > discussed in [2]. > > > > Looking at your proposal, we would set lowerBound == upperBound for > > the parallelism because we want to fully determine the parallelism > > externally based on the scaling metrics. Does that sound right? > > > > What is the timeline for these changes? Is there a JIRA? > > > > Cheers, > > Max > > > > [1] > > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/ > > [2] https://lists.apache.org/thread/2f7dgr88xtbmsohtr0f6wmsvw8sw04f5 > > > > On Mon, Feb 13, 2023 at 1:16 PM feng xiangyu <xiangyu...@gmail.com> wrote: > > > > > > Hi David, > > > > > > Thanks for your reply. I think your response totally make sense. This > > > flip targets on declaring required resource to ResourceManager instead of > > > using ResourceManager to add/remove TMs directly. > > > > > > Best, > > > Xiangyu > > > > > > > > > > > > David Morávek <david.mora...@gmail.com> 于2023年2月13日周一 15:46写道: > > > > > > > Hi everyone, > > > > > > > > @Shammon > > > > > > > > I'm not entirely sure what "config file" you're referring to. You can, > > > > of > > > > course, override the default parallelism in "flink-conf.yaml", but for > > > > sinks and sources, the parallelism needs to be tweaked on the connector > > > > level ("WITH" statement). > > > > > > > > This is something that should be achieved with tooling around Flink. We > > > > want to provide an API on the lowest level that generalizes well. > > > > Achieving > > > > what you're describing should be straightforward with this API. > > > > > > > > @Xiangyu > > > > > > > > Is it possible for this REST API to declare TM resources in the future? > > > > > > > > > > > > Would you like to add/remove TMs if you use an active Resource Manager? > > > > This would be out of the scope of this effort since it targets the > > > > scheduler component only (we make no assumptions about the used Resource > > > > Manager). Also, the AdaptiveScheduler is only intended to be used for > > > > Streaming. > > > > > > > > And for streaming jobs, I'm wondering if there is any situation we > > > > need to > > > > > rescale the TM resources of a flink cluster at first and then the > > > > adaptive > > > > > scheduler will rescale the per-vertex ResourceProfiles accordingly. > > > > > > > > > > > > > We plan on adding support for the ResourceProfiles (dynamic slot > > > > allocation) as the next step. Again we won't make any assumptions about > > > > the > > > > used Resource Manager. In other words, this effort ends by declaring > > > > desired resources to the Resource Manager. > > > > > > > > Does that make sense? > > > > > > > > @Matthias > > > > > > > > We've done another pass on the proposed API and currently lean towards > > > > having an idempotent PUT API. > > > > - We don't care too much about multiple writers' scenarios in terms of > > > > who > > > > can write an authoritative payload; this is up to the user of the API to > > > > figure out > > > > - It's indeed tricky to achieve atomicity with PATCH API; switching to > > > > PUT > > > > API seems to do the trick > > > > - We won't allow partial "payloads" anymore, meaning you need to define > > > > requirements for all vertices in the JobGraph; This is completely fine > > > > for > > > > the programmatic workflows. For DEBUG / DEMO purposes, you can use the > > > > GET > > > > endpoint and tweak the response to avoid writing the whole payload by > > > > hand. > > > > > > > > WDYT? > > > > > > > > > > > > Best, > > > > D. > > > > > > > > On Fri, Feb 10, 2023 at 11:21 AM feng xiangyu <xiangyu...@gmail.com> > > > > wrote: > > > > > > > > > Hi David, > > > > > > > > > > Thanks for creating this flip. I think this work it is very useful, > > > > > especially in autoscaling scenario. I would like to share some > > > > > questions > > > > > from my view. > > > > > > > > > > 1, Is it possible for this REST API to declare TM resources in the > > > > future? > > > > > I'm asking because we are building the autoscaling feature for Flink > > > > > OLAP > > > > > Session Cluster in ByteDance. We need to rescale the cluster's > > > > > resource > > > > on > > > > > TM level instead of Job level. It would be very helpful if we have a > > > > > REST > > > > > API for out external Autoscaling service to use. > > > > > > > > > > 2, And for streaming jobs, I'm wondering if there is any situation we > > > > need > > > > > to rescale the TM resources of a flink cluster at first and then the > > > > > adaptive scheduler will rescale the per-vertex ResourceProfiles > > > > > accordingly. > > > > > > > > > > best. > > > > > Xiangyu > > > > > > > > > > Shammon FY <zjur...@gmail.com> 于2023年2月9日周四 11:31写道: > > > > > > > > > > > Hi David > > > > > > > > > > > > Thanks for your answer. > > > > > > > > > > > > > Can you elaborate more about how you'd intend to use the > > > > > > > endpoint? I > > > > > > think we can ultimately introduce a way of re-declaring "per-vertex > > > > > > defaults," but I'd like to understand the use case bit more first. > > > > > > > > > > > > For this issue, I mainly consider the consistency of user > > > > > > configuration > > > > > and > > > > > > job runtime. For sql jobs, users usually set specific parallelism > > > > > > for > > > > > > source and sink, and set a global parallelism for other operators. > > > > These > > > > > > config items are stored in a config file. For some high-priority > > > > > > jobs, > > > > > > users may want to manage them manually. > > > > > > 1. When users need to scale the parallelism, they should update the > > > > > config > > > > > > file and restart flink job, which may take a long time. > > > > > > 2. After providing the REST API, users can just send a request to > > > > > > the > > > > job > > > > > > via REST API quickly after updating the config file. > > > > > > The configuration in the running job and config file should be the > > > > same. > > > > > > What do you think of this? > > > > > > > > > > > > best. > > > > > > Shammon > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 7, 2023 at 4:51 PM David Morávek > > > > > > <david.mora...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > Let's try to answer the questions one by one. > > > > > > > > > > > > > > *@ConradJam* > > > > > > > > > > > > > > when the number of "slots" is insufficient, can we can stop users > > > > > > rescaling > > > > > > > > or throw something to tell user "less avaliable slots to > > > > > > > > upgrade, > > > > > > please > > > > > > > > checkout your alivalbe slots" ? > > > > > > > > > > > > > > > > > > > > > > The main property of AdaptiveScheduler is that it can adapt to > > > > > "available > > > > > > > resources," which means you're still able to make progress even > > > > though > > > > > > you > > > > > > > didn't get all the slots you've asked for. Let's break down the > > > > > > > pros > > > > > and > > > > > > > cons of this property. > > > > > > > > > > > > > > - (plus) If you lose a TM for some reason, you can still recover > > > > > > > even > > > > > if > > > > > > it > > > > > > > doesn't come back. We still need to give it some time to eliminate > > > > > > > unnecessary rescaling, which can be controlled by setting > > > > > > > "resource-stabilization-timeout." > > > > > > > - (plus) The resources can arrive with a significant delay. For > > > > > example, > > > > > > > you're unable to spawn enough TMs on time because you've run out > > > > > > > of > > > > > > > resources in your k8s cluster, and you need to wait for the > > > > > > > cluster > > > > > auto > > > > > > > scaler to kick in and add new nodes to the cluster. In this > > > > > > > scenario, > > > > > > > you'll be able to start making progress faster, at the cost of > > > > multiple > > > > > > > rescalings (once the remaining resources arrive). > > > > > > > - (plus) This plays well with the declarative manner of today's > > > > > > > infrastructure. For example, you tell k8s that you need 10 TMs, > > > > > > > and > > > > > > you'll > > > > > > > eventually get them. > > > > > > > - (minus) In the case of large state jobs, the cost of multiple > > > > > > rescalings > > > > > > > might outweigh the above. > > > > > > > > > > > > > > We've already touched on the solution to this problem on the FLIP. > > > > > Please > > > > > > > notice the parallelism knob being a range with a lower and upper > > > > bound. > > > > > > > Setting both the lower and upper bound to the same value could > > > > > > > give > > > > the > > > > > > > behavior you're describing at the cost of giving up some > > > > > > > properties > > > > > that > > > > > > AS > > > > > > > gives you (you'd be falling back to the DefaultScheduler's > > > > > > > behavior). > > > > > > > > > > > > > > when user upgrade job-vertx-parallelism . I want to have an > > > > > > > interface > > > > > to > > > > > > > > query the current update parallel execution status, so that the > > > > user > > > > > or > > > > > > > > program can understand the current status > > > > > > > > > > > > > > > > > > > > > > This is a misunderstanding. We're not introducing the RESCALE > > > > endpoint. > > > > > > > This endpoint allows you to re-declare the resources needed to run > > > > the > > > > > > job. > > > > > > > Once you reach the desired resources (you get more resources than > > > > > > > the > > > > > > lower > > > > > > > bound defines), your job will run. > > > > > > > > > > > > > > We can expose a similar endpoint to "resource requirements" to > > > > > > > give > > > > you > > > > > > an > > > > > > > overview of the resources the vertices already have. You can > > > > > > > already > > > > > get > > > > > > > this from the REST API, so exposing this in yet another way > > > > > > > should be > > > > > > > considered carefully. > > > > > > > > > > > > > > *@Matthias* > > > > > > > > > > > > > > I'm wondering whether it makes sense to add some kind of resource > > > > > > > ID > > > > to > > > > > > the > > > > > > > > REST API. > > > > > > > > > > > > > > > > > > > > > That's a good question. I want to think about that and get back to > > > > the > > > > > > > question later. My main struggle when thinking about this is, "if > > > > this > > > > > > > would be an idempotent POST endpoint," would it be any different? > > > > > > > > > > > > > > How often do we allow resource requirements to be changed? > > > > > > > > > > > > > > > > > > > > > There shall be no rate limiting on the FLINK side. If this is > > > > something > > > > > > > your environment needs, you can achieve it on a different layer > > > > > > > ("we > > > > > > can't > > > > > > > have FLINK to do everything"). > > > > > > > > > > > > > > Versioning the JobGraph in the JobGraphStore rather than > > > > > > > overwriting > > > > it > > > > > > > > might be an idea. > > > > > > > > > > > > > > > > > > > > > > This sounds interesting since it would be closer to the JobGraph > > > > being > > > > > > > immutable. The main problem I see here is that this would > > > > > > > introduce a > > > > > > > BW-incompatible change so it might be a topic for follow-up FLIP. > > > > > > > > > > > > > > I'm just wondering whether we bundle two things together that are > > > > > > actually > > > > > > > > separate > > > > > > > > > > > > > > > > > > > > > > Yup, this is how we think about it as well. The main question is, > > > > "who > > > > > > > should be responsible for bookkeeping 1) the JobGraph and 2) the > > > > > > > JobResourceRequirements". The JobMaster would be the right place > > > > > > > for > > > > > > both, > > > > > > > but it's currently not the case, and we're tightly coupling the > > > > > > dispatcher > > > > > > > with the JobMaster. > > > > > > > > > > > > > > Initially, we tried to introduce a separate HA component in > > > > > > > JobMaster > > > > > for > > > > > > > bookkeeping the JobResourceRequirements, but that proved to be a > > > > > > > more > > > > > > > significant effort adding additional mess to the already messy HA > > > > > > > ecosystem. Another approach we've discussed was mutating the > > > > > > > JobGraph > > > > > and > > > > > > > setting JRR into the JobGraph structure itself. > > > > > > > > > > > > > > The middle ground for keeping this effort reasonably sized and not > > > > > > > violating "we want to keep JG immutable" too much is keeping the > > > > > > > JobResourceRequirements separate as an internal config option in > > > > > > JobGraph's > > > > > > > configuration. > > > > > > > > > > > > > > We ultimately need to rethink the tight coupling of Dispatcher and > > > > > > > JobMaster, but it needs to be a separate effort. > > > > > > > > > > > > > > ...also considering the amount of data that can be stored in a > > > > > > > > ConfigMap/ZooKeeper node if versioning the resource requirement > > > > > change > > > > > > as > > > > > > > > proposed in my previous item is an option for us. > > > > > > > > > > > > > > > > > > > > > > AFAIK we're only storing pointers to the S3 objects in HA > > > > > > > metadata, > > > > so > > > > > we > > > > > > > should be okay with having larger structures for now. > > > > > > > > > > > > > > Updating the JobGraphStore means adding more requests to the HA > > > > backend > > > > > > > API. > > > > > > > > > > > > > > > > > > > > > > It's fine unless you intend to override the resource requirements > > > > > > > a > > > > few > > > > > > > times per second. > > > > > > > > > > > > > > *@Shammon* > > > > > > > > > > > > > > How about adding some more information such as vertex type > > > > > > > > > > > > > > > > > > > > > > Since it was intended as a "debug" endpoint, it makes complete > > > > > > > sense! > > > > > > > > > > > > > > For sql jobs, we always use a unified parallelism for most > > > > > > > vertices. > > > > > Can > > > > > > > > we provide them with a more convenient setting method instead of > > > > each > > > > > > > one? > > > > > > > > > > > > > > > > > > > > > I completely feel with this. The main thoughts when designing the > > > > > > > API > > > > > > were: > > > > > > > - We want to keep it clean and easy to understand. > > > > > > > - Global parallelism can be modeled using per-vertex parallelism > > > > > > > but > > > > > not > > > > > > > the other way around. > > > > > > > - The API will be used by external tooling (operator, auto > > > > > > > scaler). > > > > > > > > > > > > > > Can you elaborate more about how you'd intend to use the > > > > > > > endpoint? I > > > > > > think > > > > > > > we can ultimately introduce a way of re-declaring "per-vertex > > > > > defaults," > > > > > > > but I'd like to understand the use case bit more first. > > > > > > > > > > > > > > *@Weijie* > > > > > > > > > > > > > > What is the default value here (based on what configuration), or > > > > > > > just > > > > > > > > infinite? > > > > > > > > > > > > > > > > > > > > > > Currently, for the lower bound, it's always one, and for the upper > > > > > bound, > > > > > > > it's either parallelism (if defined) or the maxParallelism of the > > > > > vertex > > > > > > in > > > > > > > JobGraph. This question might be another signal for making the > > > > defaults > > > > > > > explicit (see the answer to Shammon's question above). > > > > > > > > > > > > > > > > > > > > > Thanks, everyone, for your initial thoughts! > > > > > > > > > > > > > > Best, > > > > > > > D. > > > > > > > > > > > > > > On Tue, Feb 7, 2023 at 4:39 AM weijie guo > > > > > > > <guoweijieres...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks David for driving this. This is a very valuable work, > > > > > especially > > > > > > > for > > > > > > > > cloud native environment. > > > > > > > > > > > > > > > > >> How about adding some more information such as vertex type > > > > > > > > (SOURCE/MAP/JOIN and .etc) in the response of `get jobs > > > > > > > > resource-requirements`? For users, only vertex-id may be > > > > > > > > difficult > > > > to > > > > > > > > understand. > > > > > > > > > > > > > > > > +1 for this suggestion, including jobvertex's name in the > > > > > > > > response > > > > > body > > > > > > > is > > > > > > > > more > > > > > > > > user-friendly. > > > > > > > > > > > > > > > > > > > > > > > > I saw this sentence in FLIP: "Setting the upper bound to -1 will > > > > > reset > > > > > > > the > > > > > > > > value to the default setting." What is the default value here > > > > (based > > > > > > on > > > > > > > > what configuration), or just infinite? > > > > > > > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > > > > > > > Weijie > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Shammon FY <zjur...@gmail.com> 于2023年2月6日周一 18:06写道: > > > > > > > > > > > > > > > > > Hi David > > > > > > > > > > > > > > > > > > Thanks for initiating this discussion. I think declaring job > > > > > resource > > > > > > > > > requirements by REST API is very valuable. I just left some > > > > > comments > > > > > > as > > > > > > > > > followed > > > > > > > > > > > > > > > > > > 1) How about adding some more information such as vertex type > > > > > > > > > (SOURCE/MAP/JOIN and .etc) in the response of `get jobs > > > > > > > > > resource-requirements`? For users, only vertex-id may be > > > > difficult > > > > > to > > > > > > > > > understand. > > > > > > > > > > > > > > > > > > 2) For sql jobs, we always use a unified parallelism for most > > > > > > vertices. > > > > > > > > Can > > > > > > > > > we provide them with a more convenient setting method instead > > > > > > > > > of > > > > > each > > > > > > > > one? > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Shammon > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Feb 3, 2023 at 8:18 PM Matthias Pohl < > > > > > matthias.p...@aiven.io > > > > > > > > > .invalid> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks David for creating this FLIP. It sounds promising and > > > > > useful > > > > > > > to > > > > > > > > > > have. Here are some thoughts from my side (some of them > > > > > > > > > > might > > > > be > > > > > > > > rather a > > > > > > > > > > follow-up and not necessarily part of this FLIP): > > > > > > > > > > - I'm wondering whether it makes sense to add some kind of > > > > > resource > > > > > > > ID > > > > > > > > to > > > > > > > > > > the REST API. This would give Flink a tool to verify the > > > > > > > > > > PATCH > > > > > > > request > > > > > > > > of > > > > > > > > > > the external system in a compare-and-set kind of manner. > > > > > > > > > > AFAIU, > > > > > the > > > > > > > > > process > > > > > > > > > > requires the external system to retrieve the resource > > > > > requirements > > > > > > > > first > > > > > > > > > > (to retrieve the vertex IDs). A resource ID <ABC> would be > > > > > > > > > > sent > > > > > > along > > > > > > > > as > > > > > > > > > a > > > > > > > > > > unique identifier for the provided setup. It's essentially > > > > > > > > > > the > > > > > > > version > > > > > > > > ID > > > > > > > > > > of the currently deployed resource requirement > > > > > > > > > > configuration. > > > > > Flink > > > > > > > > > doesn't > > > > > > > > > > know whether the external system would use the provided > > > > > information > > > > > > > in > > > > > > > > > some > > > > > > > > > > way to derive a new set of resource requirements for this > > > > > > > > > > job. > > > > > The > > > > > > > > > > subsequent PATCH request with updated resource requirements > > > > would > > > > > > > > include > > > > > > > > > > the previously retrieved resource ID <ABC>. The PATCH call > > > > would > > > > > > fail > > > > > > > > if > > > > > > > > > > there was a concurrent PATCH call in between indicating to > > > > > > > > > > the > > > > > > > external > > > > > > > > > > system that the resource requirements were concurrently > > > > updated. > > > > > > > > > > - How often do we allow resource requirements to be changed? > > > > That > > > > > > > > > question > > > > > > > > > > might make my previous comment on the resource ID obsolete > > > > > because > > > > > > we > > > > > > > > > could > > > > > > > > > > just make any PATCH call fail if there was a resource > > > > requirement > > > > > > > > update > > > > > > > > > > within a certain time frame before the request. But such a > > > > > > > > > > time > > > > > > > period > > > > > > > > is > > > > > > > > > > something we might want to make configurable then, I guess. > > > > > > > > > > - Versioning the JobGraph in the JobGraphStore rather than > > > > > > > overwriting > > > > > > > > it > > > > > > > > > > might be an idea. This would enable us to provide resource > > > > > > > requirement > > > > > > > > > > changes in the UI or through the REST API. It is related to > > > > > > > > > > a > > > > > > problem > > > > > > > > > > around keeping track of the exception history within the > > > > > > > > > AdaptiveScheduler > > > > > > > > > > and also having to consider multiple versions of a JobGraph. > > > > But > > > > > > for > > > > > > > > that > > > > > > > > > > one, we use the ExecutionGraphInfoStore right now. > > > > > > > > > > - Updating the JobGraph in the JobGraphStore makes sense. > > > > > > > > > > I'm > > > > > just > > > > > > > > > > wondering whether we bundle two things together that are > > > > actually > > > > > > > > > separate: > > > > > > > > > > The business logic and the execution configuration (the > > > > resource > > > > > > > > > > requirements). I'm aware that this is not a flaw of the > > > > > > > > > > current > > > > > > FLIP > > > > > > > > but > > > > > > > > > > rather something that was not necessary to address in the > > > > > > > > > > past > > > > > > > because > > > > > > > > > the > > > > > > > > > > JobGraph was kind of static. I don't remember whether that > > > > > > > > > > was > > > > > > > already > > > > > > > > > > discussed while working on the AdaptiveScheduler for > > > > > > > > > > FLIP-160 > > > > > [1]. > > > > > > > > Maybe, > > > > > > > > > > I'm missing some functionality here that requires us to have > > > > > > > everything > > > > > > > > > in > > > > > > > > > > one place. But it feels like updating the entire JobGraph > > > > > > > > > > which > > > > > > could > > > > > > > > be > > > > > > > > > > actually a "config change" is not reasonable. ...also > > > > considering > > > > > > the > > > > > > > > > > amount of data that can be stored in a ConfigMap/ZooKeeper > > > > > > > > > > node > > > > > if > > > > > > > > > > versioning the resource requirement change as proposed in my > > > > > > previous > > > > > > > > > item > > > > > > > > > > is an option for us. > > > > > > > > > > - Updating the JobGraphStore means adding more requests to > > > > > > > > > > the > > > > HA > > > > > > > > backend > > > > > > > > > > API. There were some concerns shared in the discussion > > > > > > > > > > thread > > > > [2] > > > > > > for > > > > > > > > > > FLIP-270 [3] on pressuring the k8s API server in the past > > > > > > > > > > with > > > > > too > > > > > > > many > > > > > > > > > > calls. Eventhough, it's more likely to be caused by > > > > > checkpointing, > > > > > > I > > > > > > > > > still > > > > > > > > > > wanted to bring it up. We're working on a standardized > > > > > performance > > > > > > > test > > > > > > > > > to > > > > > > > > > > prepare going forward with FLIP-270 [3] right now. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Matthias > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Adaptive+Scheduler > > > > > > > > > > [2] > > > > > > https://lists.apache.org/thread/bm6rmxxk6fbrqfsgz71gvso58950d4mj > > > > > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-270%3A+Repeatable+Cleanup+of+Checkpoints > > > > > > > > > > > > > > > > > > > > On Fri, Feb 3, 2023 at 10:31 AM ConradJam > > > > > > > > > > <jam.gz...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi David: > > > > > > > > > > > > > > > > > > > > > > Thank you for drive this flip, which helps less flink > > > > shutdown > > > > > > time > > > > > > > > > > > > > > > > > > > > > > for this flip, I would like to make a few idea on share > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - when the number of "slots" is insufficient, can we > > > > > > > > > > > can > > > > > stop > > > > > > > > users > > > > > > > > > > > rescaling or throw something to tell user "less > > > > > > > > > > > avaliable > > > > > > slots > > > > > > > to > > > > > > > > > > > upgrade, > > > > > > > > > > > please checkout your alivalbe slots" ? Or we could > > > > > > > > > > > have a > > > > > > > request > > > > > > > > > > > switch(true/false) to allow this behavior > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - when user upgrade job-vertx-parallelism . I want to > > > > > > > > > > > have > > > > > an > > > > > > > > > > interface > > > > > > > > > > > to query the current update parallel execution status, > > > > > > > > > > > so > > > > > that > > > > > > > the > > > > > > > > > > user > > > > > > > > > > > or > > > > > > > > > > > program can understand the current status > > > > > > > > > > > - I want to have an interface to query the current > > > > > > > > > > > update > > > > > > > > > parallelism > > > > > > > > > > > execution status. This also helps similar to *[1] Flink > > > > K8S > > > > > > > > > Operator* > > > > > > > > > > > management > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > { > > > > > > > > > > > status: Failed > > > > > > > > > > > reason: "less avaliable slots to upgrade, please > > > > > > > > > > > checkout > > > > > your > > > > > > > > > alivalbe > > > > > > > > > > > slots" > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > - *Pending*: this job now is join the upgrade queue,it > > > > will > > > > > be > > > > > > > > > update > > > > > > > > > > > later > > > > > > > > > > > - *Rescaling*: job now is rescaling,wait it finish > > > > > > > > > > > - *Finished*: finish do it > > > > > > > > > > > - *Failed* : something have wrong,so this job is not > > > > > alivable > > > > > > > > > upgrade > > > > > > > > > > > > > > > > > > > > > > I want to supplement my above content in flip, what do you > > > > > think > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > David Morávek <d...@apache.org> 于2023年2月3日周五 16:42写道: > > > > > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > This FLIP [1] introduces a new REST API for declaring > > > > > resource > > > > > > > > > > > requirements > > > > > > > > > > > > for the Adaptive Scheduler. There seems to be a clear > > > > > > > > > > > > need > > > > > for > > > > > > > this > > > > > > > > > API > > > > > > > > > > > > based on the discussion on the "Reworking the Rescale > > > > > > > > > > > > API" > > > > > [2] > > > > > > > > > thread. > > > > > > > > > > > > > > > > > > > > > > > > Before we get started, this work is heavily based on the > > > > > > > prototype > > > > > > > > > [3] > > > > > > > > > > > > created by Till Rohrmann, and the FLIP is being > > > > > > > > > > > > published > > > > > with > > > > > > > his > > > > > > > > > > > consent. > > > > > > > > > > > > Big shoutout to him! > > > > > > > > > > > > > > > > > > > > > > > > Last and not least, thanks to Chesnay and Roman for the > > > > > initial > > > > > > > > > reviews > > > > > > > > > > > and > > > > > > > > > > > > discussions. > > > > > > > > > > > > > > > > > > > > > > > > The best start would be watching a short demo [4] that > > > > > > > > > > > > I've > > > > > > > > recorded, > > > > > > > > > > > which > > > > > > > > > > > > illustrates newly added capabilities (rescaling the > > > > > > > > > > > > running > > > > > > job, > > > > > > > > > > handing > > > > > > > > > > > > back resources to the RM, and session cluster support). > > > > > > > > > > > > > > > > > > > > > > > > The intuition behind the FLIP is being able to define > > > > > resource > > > > > > > > > > > requirements > > > > > > > > > > > > ("resource boundaries") externally that the > > > > AdaptiveScheduler > > > > > > can > > > > > > > > > > > navigate > > > > > > > > > > > > within. This is a building block for higher-level > > > > > > > > > > > > efforts > > > > > such > > > > > > as > > > > > > > > an > > > > > > > > > > > > external Autoscaler. The natural extension of this work > > > > would > > > > > > be > > > > > > > to > > > > > > > > > > allow > > > > > > > > > > > > to specify per-vertex ResourceProfiles. > > > > > > > > > > > > > > > > > > > > > > > > Looking forward to your thoughts; any feedback is > > > > > appreciated! > > > > > > > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > > > > > > > > > > > > [2] > > > > > > > > https://lists.apache.org/thread/2f7dgr88xtbmsohtr0f6wmsvw8sw04f5 > > > > > > > > > > > > [3] > > > > > > > > > > > > https://github.com/tillrohrmann/flink/tree/autoscaling > > > > > > > > > > > > [4] > > > > > > > > > > > > > > > > > > > > > > > > https://drive.google.com/file/d/1Vp8W-7Zk_iKXPTAiBT-eLPmCMd_I57Ty/view > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > D. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > Best > > > > > > > > > > > > > > > > > > > > > > ConradJam > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >