Hi Gyula,
> can you please explain why the AdaptiveScheduler is not the default > scheduler? There are still some smaller bits missing. As far as I know, the missing parts are: 1) Local recovery (reusing the already downloaded state files after restart / rescale) 2) Support for fine-grained resource management 3) Support for the session cluster (Chesnay will be submitting a FLIP for this soon) We're looking into addressing all of these limitations in the short term. Personally, I'd love to start a discussion about making transitioning the AdaptiveScheduler into a default one after those limitations are fixed. Being able to eventually deprecate and remove the DefaultScheduler would simplify the code-base by a lot since there are many adapters between new and old interfaces (eg. SlotPool-related interfaces). Best, D. On Thu, Jan 26, 2023 at 6:27 PM Gyula Fóra <gyula.f...@gmail.com> wrote: > Chesnay, > > Seems like you are suggesting that the Adaptive scheduler does everything > the standard scheduler does and more. > > I am clearly not an expert on this topic but can you please explain why the > AdaptiveScheduler is not the default scheduler? > If it can do everything, why do we even have 2 schedulers? Why not simply > drop the "old" one? > > That would probably clear up all confusionsthen :) > > Gyula > > On Thu, Jan 26, 2023 at 6:23 PM Chesnay Schepler <ches...@apache.org> > wrote: > > > There's the default and reactive mode; nothing else. > > At it's core they are the same thing; reactive mode just cranks up the > > desired parallelism to infinity and enforces certain assumptions (e.g., > > no active resource management). > > > > The advantage is that the adaptive scheduler can run jobs while not > > sufficient resources are available, and scale things up again once they > > are available. > > This is it's core functionality, but we always intended to extend it > > such that users can modify the parallelism at runtime as well. > > And since the AS can already rescale jobs (and was purpose-built with > > that functionality in mind), this is just a matter of exposing an API > > for it. Everything else is already there. > > > > As a concrete use-case, let's say you have an SLA that says jobs must > > not be down longer than X seconds, and a TM just crashed. > > If you can absolutely guarantee that your k8s cluster can provision a > > new TM within X seconds, no matter what cruel reality has in store for > > you, than you /may/ not need it. > > If you can't, well then here's a use-case for you. > > > > > Last time I looked they implemented the same interface and the same > > base class. Of course, their behavior is quite different. > > > > They never shared a base class since day 1. Are you maybe mixing up the > > AdaptiveScheduler and AdaptiveBatchScheduler? > > > > As for FLINK-30773, I think that should be covered. > > > > On 26/01/2023 17:10, Maximilian Michels wrote: > > > Thanks for the explanation. If not for the "reactive mode", what is > > > the advantage of the adaptive scheduler? What other modes does it > > > support? > > > > > >> Apart from implementing the same interface the implementations of the > > adaptive and default schedulers are separate. > > > Last time I looked they implemented the same interface and the same > > > base class. Of course, their behavior is quite different. > > > > > > I'm still very interested in learning about the future FLIPs > > > mentioned. Based on the replies, I'm assuming that they will support > > > the changes required for > > > https://issues.apache.org/jira/browse/FLINK-30773, or at least provide > > > the basis for implementing them. > > > > > > -Max > > > > > > On Thu, Jan 26, 2023 at 4:57 PM Chesnay Schepler<ches...@apache.org> > > wrote: > > >> On 26/01/2023 16:18, Maximilian Michels wrote: > > >> > > >> I see slightly different goals for the standard and the adaptive > > >> scheduler. The adaptive scheduler's goal is to adapt the Flink job > > >> according to the available resources. > > >> > > >> This is really a misconception that we just have to stomp out. > > >> > > >> This statement only applies to reactive mode, a special mode in which > > the adaptive scheduler (AS) can run in where active resource management > is > > not supported since requesting infinite resources from k8s doesn't really > > make sense. > > >> > > >> The AS itself can work perfectly fine with active resource management, > > and has no effect on how the RM talks to k8s. It can just keep the job > > running in cases where less than desired (==user-provided parallelism) > > resources are provided by k8s (possibly temporarily). > > >> > > >> On 26/01/2023 16:18, Maximilian Michels wrote: > > >> > > >> After > > >> all, both schedulers share the same super class > > >> > > >> Apart from implementing the same interface the implementations of the > > adaptive and default schedulers are separate. > > > > >