Hi Gyula,

> can you please explain why the AdaptiveScheduler is not the default
> scheduler?


There are still some smaller bits missing. As far as I know, the missing
parts are:

1) Local recovery (reusing the already downloaded state files after restart
/ rescale)
2) Support for fine-grained resource management
3) Support for the session cluster (Chesnay will be submitting a FLIP for
this soon)

We're looking into addressing all of these limitations in the short term.

Personally, I'd love to start a discussion about making transitioning the
AdaptiveScheduler into a default one after those limitations are fixed.
Being able to eventually deprecate and remove the DefaultScheduler would
simplify the code-base by a lot since there are many adapters between new
and old interfaces (eg. SlotPool-related interfaces).

Best,
D.

On Thu, Jan 26, 2023 at 6:27 PM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Chesnay,
>
> Seems like you are suggesting that the Adaptive scheduler does everything
> the standard scheduler does and more.
>
> I am clearly not an expert on this topic but can you please explain why the
> AdaptiveScheduler is not the default scheduler?
> If it can do everything, why do we even have 2 schedulers? Why not simply
> drop the "old" one?
>
> That would probably clear up all confusionsthen :)
>
> Gyula
>
> On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
> > There's the default and reactive mode; nothing else.
> > At it's core they are the same thing; reactive mode just cranks up the
> > desired parallelism to infinity and enforces certain assumptions (e.g.,
> > no active resource management).
> >
> > The advantage is that the adaptive scheduler can run jobs while not
> > sufficient resources are available, and scale things up again once they
> > are available.
> > This is it's core functionality, but we always intended to extend it
> > such that users can modify the parallelism at runtime as well.
> > And since the AS can already rescale jobs (and was purpose-built with
> > that functionality in mind), this is just a matter of exposing an API
> > for it. Everything else is already there.
> >
> > As a concrete use-case, let's say you have an SLA that says jobs must
> > not be down longer than X seconds, and a TM just crashed.
> > If you can absolutely guarantee that your k8s cluster can provision a
> > new TM within X seconds, no matter what cruel reality has in store for
> > you, than you /may/ not need it.
> > If you can't, well then here's a use-case for you.
> >
> >  > Last time I looked they implemented the same interface and the same
> > base class. Of course, their behavior is quite different.
> >
> > They never shared a base class since day 1. Are you maybe mixing up the
> > AdaptiveScheduler and AdaptiveBatchScheduler?
> >
> > As for FLINK-30773, I think that should be covered.
> >
> > On 26/01/2023 17:10, Maximilian Michels wrote:
> > > Thanks for the explanation. If not for the "reactive mode", what is
> > > the advantage of the adaptive scheduler? What other modes does it
> > > support?
> > >
> > >> Apart from implementing the same interface the implementations of the
> > adaptive and default schedulers are separate.
> > > Last time I looked they implemented the same interface and the same
> > > base class. Of course, their behavior is quite different.
> > >
> > > I'm still very interested in learning about the future FLIPs
> > > mentioned. Based on the replies, I'm assuming that they will support
> > > the changes required for
> > > https://issues.apache.org/jira/browse/FLINK-30773, or at least provide
> > > the basis for implementing them.
> > >
> > > -Max
> > >
> > > On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org>
> > wrote:
> > >> On 26/01/2023 16:18, Maximilian Michels wrote:
> > >>
> > >> I see slightly different goals for the standard and the adaptive
> > >> scheduler. The adaptive scheduler's goal is to adapt the Flink job
> > >> according to the available resources.
> > >>
> > >> This is really a misconception that we just have to stomp out.
> > >>
> > >> This statement only applies to reactive mode, a special mode in which
> > the adaptive scheduler (AS) can run in where active resource management
> is
> > not supported since requesting infinite resources from k8s doesn't really
> > make sense.
> > >>
> > >> The AS itself can work perfectly fine with active resource management,
> > and has no effect on how the RM talks to k8s. It can just keep the job
> > running in cases where less than desired (==user-provided parallelism)
> > resources are provided by k8s (possibly temporarily).
> > >>
> > >> On 26/01/2023 16:18, Maximilian Michels wrote:
> > >>
> > >> After
> > >> all, both schedulers share the same super class
> > >>
> > >> Apart from implementing the same interface the implementations of the
> > adaptive and default schedulers are separate.
> >
> >
>

Reply via email to