Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Zdenek Tison Thu, 18 Jul 2024 00:59:23 -0700

Thanks, Mathias, for your opinions.

I see two scenarios where different values for starting and rescaling would
be appropriate:


1) Flink serverless providers may prefer the fastest possible job startup
time, which can also be achieved by setting a smaller value for the
stabilization timeout, such as 1 second, in the WaitingForResources state.
Conversely, to ensure maximum job uptime, it would be prudent to increase
the stabilization period for rescaling to a higher value, such as 1 minute,
to handle server/node maintenance effectively.

2) In Reactive mode, the stabilization period is set to 0 by default.
Setting a different default value for the rescale state could enhance job
stability during node maintenance, especially since the parameter
min-parallelism-increase is no longer applicable.

Regards,

Zdenek

On Tue, Jul 16, 2024 at 5:49 PM Matthias Pohl <map...@apache.org> wrote:

> Thanks Zdenek for your proposal on aligning the resource control logic
> within the AdaptiveScheduler and cleaning up the rescaling code.
>
> Consolidating the parameters and the code as part of the 2.0 release makes
> sense in my opinion: The proposed change adds consistent behavior to the
> WaitingForResources and Executing states of the AdaptiveScheduler and irons
> out some flaws of the current implementation. This should help users get a
> clearer picture of the resource control logic. Removing obsolete rescale
> waiting time if only sufficient resources are available is also a nice
> improvement.
>
> The j.a.min-parallelism-increase [1] parameter became kind of obsolete with
> the introduction of the rescale REST endpoint in FLIP-291 [2] as you
> pointed out in the FLIP. So, deprecating it sounds reasonable.
>
> On the topic of replacing the j.a.scaling-interval.max parameter [3] with
> the j.a.resource-stabilization-timeout [4]: I'm in favor of reducing the
> complexity of the Flink configuration. Therefore, using one parameter for
> both (WaitingForResources and Executing state) to stabilize the resources
> sounds like a good idea.
>
> I'm wondering whether there are scenarios, where we would want to have
> different stabilization timeouts for starting (WaitingForResources) and
> rescaling (Executing) a job. In that case, having two resource
> stabilization parameters (one job starts and one for rescales) with one
> being the fallback for the other is a straight-forward solution.
>
> Just as a side note because it came up: Keep in mind that FLIP-461 still
> allows for immediate rescaling on a change event if checkpointing is
> disabled or j.a.max-delay-for-scale-trigger [5] is configured accordingly.
>
> Best,
> Matthias
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-min-parallelism-increase
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-291%3A+Externalized+Declarative+Resource+Management
> [3]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
> [4]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
> [5]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-max-delay-for-scale-trigger
>
>
>
> On Tue, Jul 16, 2024 at 3:05 PM Zdenek Tison <zti...@confluent.io.invalid>
> wrote:
>
> > Hi, I'd like to move a discussion from Google Docs to the mailing list so
> > that it's visible to everyone.
> >
> > *Yuanfeng Hu* brought up two concerns:
> >
> > 1) Related to the resource-stabilization-timeout,he thinks 10s May be too
> > short. In a container environment, if the number of tm added by rest
> > requests is greater than 1, the tm initialization time may be much longer
> > than 10s.
> >
> > and
> >
> > 2) He proposed a little scenario:
> > There is 1 slot in the entire cluster. At this time, my task is running
> at
> > 1 parallelism (the required slot is also 1). Then I add a tm(1slot),
> which
> > will obviously trigger a change event, and it will become stable after 10
> > seconds. If I change the required resources to 3 through rest at this
> time,
> > rescale will be triggered immediately. and runs at a parallelism of 2, Is
> > this the expected result, or do we expect that the Rescale will be
> > triggered after adding another tm, because this exactly matches the
> > required resources
> >
> > Thank you, *Yuanfeng Hu, *for opening the discussion.
> >
> >
> >
> ---------------------------------------------------------------------------------------
> >
> > 1) Regarding the stabilization period:
> >
> > I am unsure what you mean by the part, 'if the number of tm added by rest
> > requests is greater than 1.' However, I understand that it can take some
> > time to spawn additional containers/pods in a containerized environment.
> On
> > the other hand, if a user adds more TMs, for instance, by increasing the
> > number of replicas in a Kubernetes deployment, these replicas should
> appear
> > with some delay but at a similar time, correct?
> >
> > It's worth mentioning that since  FLIP-461
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-461%3A+Synchronize+rescaling+with+checkpoint+creation+to+minimize+reprocessing+for+the+AdaptiveScheduler
> > >,
> > the
> > rescale operation is synchronized with checkpoint events, so the rescale
> > doesn't happen right after this timeout expires.
> >
> > If we believe it is necessary to have different values for the
> > stabilization period in the Executing and WaitingForResources states,
> even
> > though this increases configuration complexity slightly, we could have
> > separate parameters for these two states:
> > jobmanager.adaptive-scheduler.resource-stabilization-timeout
> > <
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout
> > >
> >  and *jobmanager.adaptive-scheduler.scaling-stabilization-timeout
> > *(replacing
> > the jobmanager.adaptive-scheduler.scaling-interval.max
> > <
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-scaling-interval-max
> > >
> > ).
> >
> >
> > *2) *Regarding the proposed scenario:
> >
> > The same behavior occurs in the current Flink version when the
> > `min-parallelism-increase` is set to its default value 1. In this case,
> the
> > rescale operation is triggered immediately or aligned with the checkpoint
> > event (specified in FLIP-461).
> > So, I would say the behavior is expected.
> > Additionally, users can configure the rescaling behavior. For example,
> if a
> > user sets the lower bound parallelism to 2 and the upper bound to 3, the
> > system will rescale after 10 seconds. Alternatively, if the user sets the
> > same value for the lower and upper bounds, the rescale operation will
> wait
> > until all slots are available.
> >
> > Best Regrads,
> > Zdenek Tison
> >
> >
> >
> >
> > On Thu, Jul 11, 2024 at 2:38 PM Zdenek Tison <zti...@confluent.io>
> wrote:
> >
> > > Hello,
> > >
> > > Our team has been working on several improvements for
> AdaptiveScheduler,
> > > specifically focusing on aligning logic and timeouts in the
> > > WaitingForResources and Executing states. We believe these enhancements
> > > will improve the adaptive scheduler's robustness and maintainability.
> > >
> > > For more detailed information, please refer to the FLIP document.
> > >
> > >
> >
> https://docs.google.com/document/d/1YeYSs64LqgUr3xyBTCjiRE-CT5VEyHjGjqxnxKPIQhM/edit?usp=sharing
> > >
> > > Thanks,
> > > Zdenek Tison
> > >
> >
>

Re: [DISCUSS] FLIP-XXX: Aligning timeout logic in the AdaptiveScheduler's WaitingForResources and Executing states

Reply via email to