Thanks Zdenek for your proposal on aligning the resource control logic within the AdaptiveScheduler and cleaning up the rescaling code.
Consolidating the parameters and the code as part of the 2.0 release makes sense in my opinion: The proposed change adds consistent behavior to the WaitingForResources and Executing states of the AdaptiveScheduler and irons out some flaws of the current implementation. This should help users get a clearer picture of the resource control logic. Removing obsolete rescale waiting time if only sufficient resources are available is also a nice improvement. The j.a.min-parallelism-increase [1] parameter became kind of obsolete with the introduction of the rescale REST endpoint in FLIP-291 [2] as you pointed out in the FLIP. So, deprecating it sounds reasonable. On the topic of replacing the j.a.scaling-interval.max parameter [3] with the j.a.resource-stabilization-timeout [4]: I'm in favor of reducing the complexity of the Flink configuration. Therefore, using one parameter for both (WaitingForResources and Executing state) to stabilize the resources sounds like a good idea. I'm wondering whether there are scenarios, where we would want to have different stabilization timeouts for starting (WaitingForResources) and rescaling (Executing) a job. In that case, having two resource stabilization parameters (one job starts and one for rescales) with one being the fallback for the other is a straight-forward solution. Just as a side note because it came up: Keep in mind that FLIP-461 still allows for immediate rescaling on a change event if checkpointing is disabled or j.a.max-delay-for-scale-trigger [5] is configured accordingly. Best, Matthias [1] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management [3] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max [4] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout [5] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-max-delay-for-scale-trigger On Tue, Jul 16, 2024 at 3:05 PM Zdenek Tison <zti...@confluent.io.invalid> wrote: > Hi, I'd like to move a discussion from Google Docs to the mailing list so > that it's visible to everyone. > > *Yuanfeng Hu* brought up two concerns: > > 1) Related to the resource-stabilization-timeout,he thinks 10s May be too > short. In a container environment, if the number of tm added by rest > requests is greater than 1, the tm initialization time may be much longer > than 10s. > > and > > 2) He proposed a little scenario: > There is 1 slot in the entire cluster. At this time, my task is running at > 1 parallelism (the required slot is also 1). Then I add a tm(1slot), which > will obviously trigger a change event, and it will become stable after 10 > seconds. If I change the required resources to 3 through rest at this time, > rescale will be triggered immediately. and runs at a parallelism of 2, Is > this the expected result, or do we expect that the Rescale will be > triggered after adding another tm, because this exactly matches the > required resources > > Thank you, *Yuanfeng Hu, *for opening the discussion. > > > --------------------------------------------------------------------------------------- > > 1) Regarding the stabilization period: > > I am unsure what you mean by the part, 'if the number of tm added by rest > requests is greater than 1.' However, I understand that it can take some > time to spawn additional containers/pods in a containerized environment. On > the other hand, if a user adds more TMs, for instance, by increasing the > number of replicas in a Kubernetes deployment, these replicas should appear > with some delay but at a similar time, correct? > > It's worth mentioning that since FLIP-461 > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler > >, > the > rescale operation is synchronized with checkpoint events, so the rescale > doesn't happen right after this timeout expires. > > If we believe it is necessary to have different values for the > stabilization period in the Executing and WaitingForResources states, even > though this increases configuration complexity slightly, we could have > separate parameters for these two states: > jobmanager.adaptive-scheduler.resource-stabilization-timeout > < > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout > > > and *jobmanager.adaptive-scheduler.scaling-stabilization-timeout > *(replacing > the jobmanager.adaptive-scheduler.scaling-interval.max > < > https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max > > > ). > > > *2) *Regarding the proposed scenario: > > The same behavior occurs in the current Flink version when the > `min-parallelism-increase` is set to its default value 1. In this case, the > rescale operation is triggered immediately or aligned with the checkpoint > event (specified in FLIP-461). > So, I would say the behavior is expected. > Additionally, users can configure the rescaling behavior. For example, if a > user sets the lower bound parallelism to 2 and the upper bound to 3, the > system will rescale after 10 seconds. Alternatively, if the user sets the > same value for the lower and upper bounds, the rescale operation will wait > until all slots are available. > > Best Regrads, > Zdenek Tison > > > > > On Thu, Jul 11, 2024 at 2:38 PM Zdenek Tison <zti...@confluent.io> wrote: > > > Hello, > > > > Our team has been working on several improvements for AdaptiveScheduler, > > specifically focusing on aligning logic and timeouts in the > > WaitingForResources and Executing states. We believe these enhancements > > will improve the adaptive scheduler's robustness and maintainability. > > > > For more detailed information, please refer to the FLIP document. > > > > > https://docs.google.com/document/d/1YeYSs64LqgUr3xyBTCjiRE-CT5VEyHjGjqxnxKPIQhM/edit?usp=sharing > > > > Thanks, > > Zdenek Tison > > >