Chesnay, Seems like you are suggesting that the Adaptive scheduler does everything the standard scheduler does and more.
I am clearly not an expert on this topic but can you please explain why the AdaptiveScheduler is not the default scheduler? If it can do everything, why do we even have 2 schedulers? Why not simply drop the "old" one? That would probably clear up all confusionsthen :) Gyula On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler <ches...@apache.org> wrote: > There's the default and reactive mode; nothing else. > At it's core they are the same thing; reactive mode just cranks up the > desired parallelism to infinity and enforces certain assumptions (e.g., > no active resource management). > > The advantage is that the adaptive scheduler can run jobs while not > sufficient resources are available, and scale things up again once they > are available. > This is it's core functionality, but we always intended to extend it > such that users can modify the parallelism at runtime as well. > And since the AS can already rescale jobs (and was purpose-built with > that functionality in mind), this is just a matter of exposing an API > for it. Everything else is already there. > > As a concrete use-case, let's say you have an SLA that says jobs must > not be down longer than X seconds, and a TM just crashed. > If you can absolutely guarantee that your k8s cluster can provision a > new TM within X seconds, no matter what cruel reality has in store for > you, than you /may/ not need it. > If you can't, well then here's a use-case for you. > > > Last time I looked they implemented the same interface and the same > base class. Of course, their behavior is quite different. > > They never shared a base class since day 1. Are you maybe mixing up the > AdaptiveScheduler and AdaptiveBatchScheduler? > > As for FLINK-30773, I think that should be covered. > > On 26/01/2023 17:10, Maximilian Michels wrote: > > Thanks for the explanation. If not for the "reactive mode", what is > > the advantage of the adaptive scheduler? What other modes does it > > support? > > > >> Apart from implementing the same interface the implementations of the > adaptive and default schedulers are separate. > > Last time I looked they implemented the same interface and the same > > base class. Of course, their behavior is quite different. > > > > I'm still very interested in learning about the future FLIPs > > mentioned. Based on the replies, I'm assuming that they will support > > the changes required for > > https://issues.apache.org/jira/browse/FLINK-30773, or at least provide > > the basis for implementing them. > > > > -Max > > > > On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org> > wrote: > >> On 26/01/2023 16:18, Maximilian Michels wrote: > >> > >> I see slightly different goals for the standard and the adaptive > >> scheduler. The adaptive scheduler's goal is to adapt the Flink job > >> according to the available resources. > >> > >> This is really a misconception that we just have to stomp out. > >> > >> This statement only applies to reactive mode, a special mode in which > the adaptive scheduler (AS) can run in where active resource management is > not supported since requesting infinite resources from k8s doesn't really > make sense. > >> > >> The AS itself can work perfectly fine with active resource management, > and has no effect on how the RM talks to k8s. It can just keep the job > running in cases where less than desired (==user-provided parallelism) > resources are provided by k8s (possibly temporarily). > >> > >> On 26/01/2023 16:18, Maximilian Michels wrote: > >> > >> After > >> all, both schedulers share the same super class > >> > >> Apart from implementing the same interface the implementations of the > adaptive and default schedulers are separate. > >