Thanks, Mathias, for your opinions. I see two scenarios where different values for starting and rescaling would be appropriate:
1) Flink serverless providers may prefer the fastest possible job startup time, which can also be achieved by setting a smaller value for the stabilization timeout, such as 1 second, in the WaitingForResources state. Conversely, to ensure maximum job uptime, it would be prudent to increase the stabilization period for rescaling to a higher value, such as 1 minute, to handle server/node maintenance effectively. 2) In Reactive mode, the stabilization period is set to 0 by default. Setting a different default value for the rescale state could enhance job stability during node maintenance, especially since the parameter min-parallelism-increase is no longer applicable. Regards, Zdenek On Tue, Jul 16, 2024 at 5:49 PM Matthias Pohl <map...@apache.org> wrote: > Thanks Zdenek for your proposal on aligning the resource control logic > within the AdaptiveScheduler and cleaning up the rescaling code. > > Consolidating the parameters and the code as part of the 2.0 release makes > sense in my opinion: The proposed change adds consistent behavior to the > WaitingForResources and Executing states of the AdaptiveScheduler and irons > out some flaws of the current implementation. This should help users get a > clearer picture of the resource control logic. Removing obsolete rescale > waiting time if only sufficient resources are available is also a nice > improvement. > > The j.a.min-parallelism-increase [1] parameter became kind of obsolete with > the introduction of the rescale REST endpoint in FLIP-291 [2] as you > pointed out in the FLIP. So, deprecating it sounds reasonable. > > On the topic of replacing the j.a.scaling-interval.max parameter [3] with > the j.a.resource-stabilization-timeout [4]: I'm in favor of reducing the > complexity of the Flink configuration. Therefore, using one parameter for > both (WaitingForResources and Executing state) to stabilize the resources > sounds like a good idea. > > I'm wondering whether there are scenarios, where we would want to have > different stabilization timeouts for starting (WaitingForResources) and > rescaling (Executing) a job. In that case, having two resource > stabilization parameters (one job starts and one for rescales) with one > being the fallback for the other is a straight-forward solution. > > Just as a side note because it came up: Keep in mind that FLIP-461 still > allows for immediate rescaling on a change event if checkpointing is > disabled or j.a.max-delay-for-scale-trigger [5] is configured accordingly. > > Best, > Matthias > > [1] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management > [3] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max > [4] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout > [5] > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-max-delay-for-scale-trigger > > > > On Tue, Jul 16, 2024 at 3:05 PM Zdenek Tison <zti...@confluent.io.invalid> > wrote: > > > Hi, I'd like to move a discussion from Google Docs to the mailing list so > > that it's visible to everyone. > > > > *Yuanfeng Hu* brought up two concerns: > > > > 1) Related to the resource-stabilization-timeout,he thinks 10s May be too > > short. In a container environment, if the number of tm added by rest > > requests is greater than 1, the tm initialization time may be much longer > > than 10s. > > > > and > > > > 2) He proposed a little scenario: > > There is 1 slot in the entire cluster. At this time, my task is running > at > > 1 parallelism (the required slot is also 1). Then I add a tm(1slot), > which > > will obviously trigger a change event, and it will become stable after 10 > > seconds. If I change the required resources to 3 through rest at this > time, > > rescale will be triggered immediately. and runs at a parallelism of 2, Is > > this the expected result, or do we expect that the Rescale will be > > triggered after adding another tm, because this exactly matches the > > required resources > > > > Thank you, *Yuanfeng Hu, *for opening the discussion. > > > > > > > --------------------------------------------------------------------------------------- > > > > 1) Regarding the stabilization period: > > > > I am unsure what you mean by the part, 'if the number of tm added by rest > > requests is greater than 1.' However, I understand that it can take some > > time to spawn additional containers/pods in a containerized environment. > On > > the other hand, if a user adds more TMs, for instance, by increasing the > > number of replicas in a Kubernetes deployment, these replicas should > appear > > with some delay but at a similar time, correct? > > > > It's worth mentioning that since FLIP-461 > > < > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > > >, > > the > > rescale operation is synchronized with checkpoint events, so the rescale > > doesn't happen right after this timeout expires. > > > > If we believe it is necessary to have different values for the > > stabilization period in the Executing and WaitingForResources states, > even > > though this increases configuration complexity slightly, we could have > > separate parameters for these two states: > > jobmanager.adaptive-scheduler.resource-stabilization-timeout > > < > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout > > > > > and *jobmanager.adaptive-scheduler.scaling-stabilization-timeout > > *(replacing > > the jobmanager.adaptive-scheduler.scaling-interval.max > > < > > > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max > > > > > ). > > > > > > *2) *Regarding the proposed scenario: > > > > The same behavior occurs in the current Flink version when the > > `min-parallelism-increase` is set to its default value 1. In this case, > the > > rescale operation is triggered immediately or aligned with the checkpoint > > event (specified in FLIP-461). > > So, I would say the behavior is expected. > > Additionally, users can configure the rescaling behavior. For example, > if a > > user sets the lower bound parallelism to 2 and the upper bound to 3, the > > system will rescale after 10 seconds. Alternatively, if the user sets the > > same value for the lower and upper bounds, the rescale operation will > wait > > until all slots are available. > > > > Best Regrads, > > Zdenek Tison > > > > > > > > > > On Thu, Jul 11, 2024 at 2:38 PM Zdenek Tison <zti...@confluent.io> > wrote: > > > > > Hello, > > > > > > Our team has been working on several improvements for > AdaptiveScheduler, > > > specifically focusing on aligning logic and timeouts in the > > > WaitingForResources and Executing states. We believe these enhancements > > > will improve the adaptive scheduler's robustness and maintainability. > > > > > > For more detailed information, please refer to the FLIP document. > > > > > > > > > https://docs.google.com/document/d/1YeYSs64LqgUr3xyBTCjiRE-CT5VEyHjGjqxnxKPIQhM/edit?usp=sharing > > > > > > Thanks, > > > Zdenek Tison > > > > > >