There's the default and reactive mode; nothing else.
At it's core they are the same thing; reactive mode just cranks up the
desired parallelism to infinity and enforces certain assumptions (e.g.,
no active resource management).
The advantage is that the adaptive scheduler can run jobs while not
sufficient resources are available, and scale things up again once they
are available.
This is it's core functionality, but we always intended to extend it
such that users can modify the parallelism at runtime as well.
And since the AS can already rescale jobs (and was purpose-built with
that functionality in mind), this is just a matter of exposing an API
for it. Everything else is already there.
As a concrete use-case, let's say you have an SLA that says jobs must
not be down longer than X seconds, and a TM just crashed.
If you can absolutely guarantee that your k8s cluster can provision a
new TM within X seconds, no matter what cruel reality has in store for
you, than you /may/ not need it.
If you can't, well then here's a use-case for you.
> Last time I looked they implemented the same interface and the same
base class. Of course, their behavior is quite different.
They never shared a base class since day 1. Are you maybe mixing up the
AdaptiveScheduler and AdaptiveBatchScheduler?
As for FLINK-30773, I think that should be covered.
On 26/01/2023 17:10, Maximilian Michels wrote:
Thanks for the explanation. If not for the "reactive mode", what is
the advantage of the adaptive scheduler? What other modes does it
support?
Apart from implementing the same interface the implementations of the adaptive
and default schedulers are separate.
Last time I looked they implemented the same interface and the same
base class. Of course, their behavior is quite different.
I'm still very interested in learning about the future FLIPs
mentioned. Based on the replies, I'm assuming that they will support
the changes required for
https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
the basis for implementing them.
-Max
On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org> wrote:
On 26/01/2023 16:18, Maximilian Michels wrote:
I see slightly different goals for the standard and the adaptive
scheduler. The adaptive scheduler's goal is to adapt the Flink job
according to the available resources.
This is really a misconception that we just have to stomp out.
This statement only applies to reactive mode, a special mode in which the
adaptive scheduler (AS) can run in where active resource management is not
supported since requesting infinite resources from k8s doesn't really make
sense.
The AS itself can work perfectly fine with active resource management, and has
no effect on how the RM talks to k8s. It can just keep the job running in cases
where less than desired (==user-provided parallelism) resources are provided by
k8s (possibly temporarily).
On 26/01/2023 16:18, Maximilian Michels wrote:
After
all, both schedulers share the same super class
Apart from implementing the same interface the implementations of the adaptive
and default schedulers are separate.