Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Matthias Pohl Tue, 16 Jul 2024 08:49:21 -0700

Thanks Zdenek for your proposal on aligning the resource control logic
within the AdaptiveScheduler and cleaning up the rescaling code.


Consolidating the parameters and the code as part of the 2.0 release makes
sense in my opinion: The proposed change adds consistent behavior to the
WaitingForResources and Executing states of the AdaptiveScheduler and irons
out some flaws of the current implementation. This should help users get a
clearer picture of the resource control logic. Removing obsolete rescale
waiting time if only sufficient resources are available is also a nice
improvement.

The j.a.min-parallelism-increase [1] parameter became kind of obsolete with
the introduction of the rescale REST endpoint in FLIP-291 [2] as you
pointed out in the FLIP. So, deprecating it sounds reasonable.

On the topic of replacing the j.a.scaling-interval.max parameter [3] with
the j.a.resource-stabilization-timeout [4]: I'm in favor of reducing the
complexity of the Flink configuration. Therefore, using one parameter for
both (WaitingForResources and Executing state) to stabilize the resources
sounds like a good idea.

I'm wondering whether there are scenarios, where we would want to have
different stabilization timeouts for starting (WaitingForResources) and
rescaling (Executing) a job. In that case, having two resource
stabilization parameters (one job starts and one for rescales) with one
being the fallback for the other is a straight-forward solution.

Just as a side note because it came up: Keep in mind that FLIP-461 still
allows for immediate rescaling on a change event if checkpointing is
disabled or j.a.max-delay-for-scale-trigger [5] is configured accordingly.

Best,
Matthias

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
[3]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
[4]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
[5]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-max-delay-for-scale-trigger



On Tue, Jul 16, 2024 at 3:05 PM Zdenek Tison <zti...@confluent.io.invalid>
wrote:

> Hi, I'd like to move a discussion from Google Docs to the mailing list so
> that it's visible to everyone.
>
> *Yuanfeng Hu* brought up two concerns:
>
> 1) Related to the resource-stabilization-timeout,he thinks 10s May be too
> short. In a container environment, if the number of tm added by rest
> requests is greater than 1, the tm initialization time may be much longer
> than 10s.
>
> and
>
> 2) He proposed a little scenario:
> There is 1 slot in the entire cluster. At this time, my task is running at
> 1 parallelism (the required slot is also 1). Then I add a tm(1slot), which
> will obviously trigger a change event, and it will become stable after 10
> seconds. If I change the required resources to 3 through rest at this time,
> rescale will be triggered immediately. and runs at a parallelism of 2, Is
> this the expected result, or do we expect that the Rescale will be
> triggered after adding another tm, because this exactly matches the
> required resources
>
> Thank you, *Yuanfeng Hu, *for opening the discussion.
>
>
> ---------------------------------------------------------------------------------------
>
> 1) Regarding the stabilization period:
>
> I am unsure what you mean by the part, 'if the number of tm added by rest
> requests is greater than 1.' However, I understand that it can take some
> time to spawn additional containers/pods in a containerized environment. On
> the other hand, if a user adds more TMs, for instance, by increasing the
> number of replicas in a Kubernetes deployment, these replicas should appear
> with some delay but at a similar time, correct?
>
> It's worth mentioning that since  FLIP-461
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> >,
> the
> rescale operation is synchronized with checkpoint events, so the rescale
> doesn't happen right after this timeout expires.
>
> If we believe it is necessary to have different values for the
> stabilization period in the Executing and WaitingForResources states, even
> though this increases configuration complexity slightly, we could have
> separate parameters for these two states:
> jobmanager.adaptive-scheduler.resource-stabilization-timeout
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
> >
>  and *jobmanager.adaptive-scheduler.scaling-stabilization-timeout
> *(replacing
> the jobmanager.adaptive-scheduler.scaling-interval.max
> <
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
> >
> ).
>
>
> *2) *Regarding the proposed scenario:
>
> The same behavior occurs in the current Flink version when the
> `min-parallelism-increase` is set to its default value 1. In this case, the
> rescale operation is triggered immediately or aligned with the checkpoint
> event (specified in FLIP-461).
> So, I would say the behavior is expected.
> Additionally, users can configure the rescaling behavior. For example, if a
> user sets the lower bound parallelism to 2 and the upper bound to 3, the
> system will rescale after 10 seconds. Alternatively, if the user sets the
> same value for the lower and upper bounds, the rescale operation will wait
> until all slots are available.
>
> Best Regrads,
> Zdenek Tison
>
>
>
>
> On Thu, Jul 11, 2024 at 2:38 PM Zdenek Tison <zti...@confluent.io> wrote:
>
> > Hello,
> >
> > Our team has been working on several improvements for AdaptiveScheduler,
> > specifically focusing on aligning logic and timeouts in the
> > WaitingForResources and Executing states. We believe these enhancements
> > will improve the adaptive scheduler's robustness and maintainability.
> >
> > For more detailed information, please refer to the FLIP document.
> >
> >
> https://docs.google.com/document/d/1YeYSs64LqgUr3xyBTCjiRE-CT5VEyHjGjqxnxKPIQhM/edit?usp=sharing
> >
> > Thanks,
> > Zdenek Tison
> >
>

Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Reply via email to