Sorry I'm late to join discuss, I've gleaned a lot of useful information from you guys
*@max* - when user repartition, we still need to restart the job, can we try to do this part of the work internally instead of externally, as *@konstantin* said only trigger rescaling when the checkpoint or retain-checkpoint is completed operations to minimize reprocessing *@konstantin* - I think you mentioned that 2 FLIPs are being drafted which I consider to be the condition to achieve the *@max* goal, I would love to join this discussion and contribute it. I've tried a native implementation of this part myself, if I can help the community that's the best I can do *@chesnay* - The docs section is confusion/misconceptions confusing like *@gyula *say, I'll see if I can fix it *About Rescale Api* Some limitations and differences between *default* and *reactive mode* were discussed earlier, and *@chesnay* explained some of their limitations and behaviors, essentially they are two different things. I agree that when reactive mode is ready, it should be used as the *reactive mode* for the default *stream processing* job. As for the *[1] **Rescale API*, as we know now it seems to be unusable, I believe the goal of this api is to be able to do fast reparallelism. I would like to wait until the discussion is over and the 2 draft FILPs mentioned earlier are completed. It is not too late to make another decision on whether to modify the *[2] **Rescale Rest API *to support for parallelism modification of job vertices 1. *https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/ <https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/deployment/elastic_scaling/> * 2. *https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling <https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-rescaling> * Best~ Maximilian Michels <m...@apache.org> 于2023年1月24日周二 01:08写道: > Hi, > > The current rescale API appears to be a work in progress. A couple years > ago, we disabled access to the API [1]. > > I'm looking into this problem as part of working on autoscaling [2] where > we currently require a full restart of the job to apply the parallelism > overrides. This adds additional delay and comes with the caveat that we > don't know whether sufficient resources are available prior to executing > the scaling decision. We obviously do not want to get stuck due to a lack > of resources. So a rescale API would have to ensure enough resources are > available prior to restarting the job. > > I've created an issue here: > https://issues.apache.org/jira/browse/FLINK-30773 > > Any comments or interest in working on this? > > -Max > > [1] https://issues.apache.org/jira/browse/FLINK-12312 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling > -- Best ConradJam