Re: Reworking the Rescale API

Chesnay Schepler Thu, 26 Jan 2023 09:23:42 -0800

There's the default and reactive mode; nothing else.

At it's core they are the same thing; reactive mode just cranks up thedesired parallelism to infinity and enforces certain assumptions (e.g.,no active resource management).

The advantage is that the adaptive scheduler can run jobs while notsufficient resources are available, and scale things up again once theyare available.This is it's core functionality, but we always intended to extend itsuch that users can modify the parallelism at runtime as well.And since the AS can already rescale jobs (and was purpose-built withthat functionality in mind), this is just a matter of exposing an APIfor it. Everything else is already there.

As a concrete use-case, let's say you have an SLA that says jobs mustnot be down longer than X seconds, and a TM just crashed.If you can absolutely guarantee that your k8s cluster can provision anew TM within X seconds, no matter what cruel reality has in store foryou, than you /may/ not need it.

If you can't, well then here's a use-case for you.

> Last time I looked they implemented the same interface and the samebase class. Of course, their behavior is quite different.

They never shared a base class since day 1. Are you maybe mixing up theAdaptiveScheduler and AdaptiveBatchScheduler?


As for FLINK-30773, I think that should be covered.

On 26/01/2023 17:10, Maximilian Michels wrote:

Thanks for the explanation. If not for the "reactive mode", what is
the advantage of the adaptive scheduler? What other modes does it
support?

Apart from implementing the same interface the implementations of the adaptive 
and default schedulers are separate.

Last time I looked they implemented the same interface and the same
base class. Of course, their behavior is quite different.

I'm still very interested in learning about the future FLIPs
mentioned. Based on the replies, I'm assuming that they will support
the changes required for
https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
the basis for implementing them.

-Max

On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org>  wrote:

On 26/01/2023 16:18, Maximilian Michels wrote:

I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources.

This is really a misconception that we just have to stomp out.

This statement only applies to reactive mode, a special mode in which the 
adaptive scheduler (AS) can run in where active resource management is not 
supported since requesting infinite resources from k8s doesn't really make 
sense.

The AS itself can work perfectly fine with active resource management, and has 
no effect on how the RM talks to k8s. It can just keep the job running in cases 
where less than desired (==user-provided parallelism) resources are provided by 
k8s (possibly temporarily).

On 26/01/2023 16:18, Maximilian Michels wrote:

After
all, both schedulers share the same super class

Apart from implementing the same interface the implementations of the adaptive 
and default schedulers are separate.

Re: Reworking the Rescale API

Reply via email to