Thank you everyone for the clarification. It looks like the Rescale API will require some more work before it can be fully taken advantage of. I'm wondering whether in addition to the Rescale API we want to provide a means to do a rescale without an explicit API call but as part of the job submission.
Clearly, the Rescale API + the Adaptive Scheduler will provide a better "rescale experience" in case of a running job. But some users have implemented autoscaling in a way that will update the existing job deployment in k8s to trigger a redeploy of the job with an updated configuration and job graph parallelisms. The problem is that we only allow setting the default parallelism so far. Would something like along the lines of https://issues.apache.org/jira/browse/FLINK-29501 be conceivable? -Max On Wed, Oct 12, 2022 at 5:58 AM Jiangang Liu <liujiangangp...@gmail.com> wrote: > Thanks for the attention to the rescale api. Dynamic resource adjust is > useful for streaming jobs since the throughput can change in different > time. The rescale api is a lightweight way to change the job's parallelism. > This is importance for some jobs, for example, the jobs are in activities > or related to money which can not be delayed. > In our production scenario,we have supported a simple rescale api which > may be not perfect. By this chance, I suggest to support the rescale api in > adaptive scheduler for auto-scaling. > > Chesnay Schepler <ches...@apache.org> 于2022年10月11日周二 20:36写道: > >> The AdaptiveScheduler is not limited to reactive mode. There are no >> deployment limitations for the scheduler itself. >> In a nutshell, all that reactive mode does is crank the target >> parallelism to infinity, when usually it is the parallelism the user has >> set in the job/configuration. >> >> I think it would be fine if a new/revised rescale API were only >> available in the Adaptive Scheduler (without reactive mode!) for starters. >> We'd require way more stuff to make this useful for batch workloads. >> >> On 10/10/2022 16:47, Maximilian Michels wrote: >> > Hey Gyula, >> > >> > Is the Adaptive Scheduler limited to the Reactive mode? I agree that if >> we >> > move forward with the Adaptive Scheduler solution it should support all >> > deployment scenarios. >> > >> > Thanks, >> > -Max >> > >> > On Sun, Oct 9, 2022 at 6:10 AM Gyula Fóra <gyula.f...@gmail.com> wrote: >> > >> >> Hi! >> >> >> >> I think we have to make sure that the Rescale API will work also >> without >> >> the adaptive scheduler (for instance when we are running Flink with the >> >> Kubernetes Native Integration or in other cases where the adaptive >> >> scheduler is not supported). >> >> >> >> What do you think? >> >> >> >> Cheers >> >> Gyula >> >> >> >> >> >> >> >> On Fri, Oct 7, 2022 at 3:50 PM Maximilian Michels <m...@apache.org> >> wrote: >> >> >> >>> We've been looking into ways to support programmatic rescaling of job >> >>> vertices. This feature is typically required for any type of Flink >> >>> autoscaler which does not merely set the default parallelism but >> adjusts >> >>> the parallelisms on a JobVertex level. >> >>> >> >>> We've had an initial discussion here: >> >>> https://issues.apache.org/jira/browse/FLINK-29501 where Chesnay >> suggested >> >>> to use the infamous "rescaling" API: >> >>> >> >>> >> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling >> >>> This API is disabled as of >> >>> https://issues.apache.org/jira/browse/FLINK-12312 >> >>> . >> >>> >> >>> Since there is the Adaptive Scheduler in Flink now, it may be >> feasible to >> >>> re-enable the API (at least for streaming jobs) and allow overriding >> the >> >>> parallelism of job vertices in addition to the default parallelism. >> >>> >> >>> Any thoughts? >> >>> >> >>> -Max >> >>> >> >>