Pretty much, except that with Flink 1.18 autoscaler can scale the job in place without restarting the JM (even without reactive mode )
So actually best option is autoscaler with Flink 1.18 native mode (no reactive) Gyula On Fri, 1 Sep 2023 at 13:54, Dennis Jung <inylov...@gmail.com> wrote: > Thanks for feedback. > Could you check whether I understand correctly? > > *Only using 'reactive' mode:* > By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh > start'), parallelism will be increased. For example, when job parallelism > is 1 and TM is 1, and if adding 1 new TM, JobManager will be restarted and > parallelism will be 2. > But the number of TM is not being controlled automatically. > > *Autoscaler + non-reactive:* > It can flexibilly control the number of TM by several metrics(CPU usage, > throughput, ...), and JobManager will be restarted when scaling. But job > parallelism is the same after the number of TM has been changed. > > *Autoscaler + 'reactive' mode*: > It can control numbers of TM by metric, and increase/decrease job > parallelism by changing TM. > > Regards, > Jung > > 2023년 9월 1일 (금) 오후 8:16, Gyula Fóra <gyula.f...@gmail.com>님이 작성: > >> I would look at reactive scaling as a way to increase / decrease >> parallelism. >> >> It’s not a way to automatically decide when to actually do it as you need >> to create new TMs . >> >> The autoscaler could use reactive mode to change the parallelism but you >> need the autoscaler itself to decide when new resources should be added >> >> On Fri, 1 Sep 2023 at 13:09, Dennis Jung <inylov...@gmail.com> wrote: >> >>> For now, the thing I've found about 'reactive' mode is that it >>> automatically adjusts 'job parallelism' when TaskManager is >>> increased/decreased. >>> >>> >>> https://www.slideshare.net/FlinkForward/autoscaling-flink-with-reactive-mode >>> >>> Is there some other feature that only 'reactive' mode offers for scaling? >>> >>> Thanks. >>> Regards. >>> >>> >>> >>> 2023년 9월 1일 (금) 오후 4:56, Dennis Jung <inylov...@gmail.com>님이 작성: >>> >>>> Hello, >>>> Thank you for your response. I have few more questions in following: >>>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/ >>>> >>>> *Reactive Mode configures a job so that it always uses all resources >>>> available in the cluster. Adding a TaskManager will scale up your job, >>>> removing resources will scale it down. Flink will manage the parallelism of >>>> the job, always setting it to the highest possible values.* >>>> => Does this mean when I add/remove TaskManager in 'non-reactive' mode, >>>> resource(CPU/Memory/Etc.) of the cluster is not being changed? >>>> >>>> *Reactive Mode restarts a job on a rescaling event, restoring it from >>>> the latest completed checkpoint. This means that there is no overhead of >>>> creating a savepoint (which is needed for manually rescaling a job). Also, >>>> the amount of data that is reprocessed after rescaling depends on the >>>> checkpointing interval, and the restore time depends on the state size.* >>>> => As I know 'rescaling' also works in non-reactive mode, with >>>> restoring checkpoint. What is the difference of using 'reactive' here? >>>> >>>> *The Reactive Mode allows Flink users to implement a powerful >>>> autoscaling mechanism, by having an external service monitor certain >>>> metrics, such as consumer lag, aggregate CPU utilization, throughput or >>>> latency. As soon as these metrics are above or below a certain threshold, >>>> additional TaskManagers can be added or removed from the Flink cluster.* >>>> => Why is this only possible in 'reactive' mode? Seems this is more >>>> related to 'autoscaler'. Are there some specific features/API which can >>>> control TaskManager/Parallelism only in 'reactive' mode? >>>> >>>> Thank you. >>>> >>>> 2023년 9월 1일 (금) 오후 3:30, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>>> >>>>> The reactive mode reacts to available resources. The autoscaler reacts >>>>> to changing load and processing capacity and adjusts resources. >>>>> >>>>> Completely different concepts and applicability. >>>>> Most people want the autoscaler , but this is a recent feature and is >>>>> specific to the k8s operator at the moment. >>>>> >>>>> Gyula >>>>> >>>>> On Fri, 1 Sep 2023 at 04:50, Dennis Jung <inylov...@gmail.com> wrote: >>>>> >>>>>> Hello, >>>>>> Thanks for your notice. >>>>>> >>>>>> Than what is the purpose of using 'reactive', if this doesn't do >>>>>> anything itself? >>>>>> What is the difference if I use auto-scaler without 'reactive' mode? >>>>>> >>>>>> Regards, >>>>>> Jung >>>>>> >>>>>> >>>>>> >>>>>> 2023년 8월 18일 (금) 오후 7:51, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>>>>> >>>>>>> Hi! >>>>>>> >>>>>>> I think what you need is probably not the reactive mode but a proper >>>>>>> autoscaler. The reactive mode as you say doesn't do anything in itself, >>>>>>> you >>>>>>> need to build a lot of logic around it. >>>>>>> >>>>>>> Check this instead: >>>>>>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/ >>>>>>> >>>>>>> The Kubernetes Operator has a built in autoscaler that can scale >>>>>>> jobs based on kafka data rate / processing throughput. It also doesn't >>>>>>> rely >>>>>>> on the reactive mode. >>>>>>> >>>>>>> Cheers, >>>>>>> Gyula >>>>>>> >>>>>>> On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung <inylov...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> Sorry for frequent questions. This is a question about 'reactive' >>>>>>>> mode. >>>>>>>> >>>>>>>> 1. As far as I understand, though I've setup `scheduler-mode: >>>>>>>> reactive`, it will not change parallelism automatically by itself, by >>>>>>>> CPU >>>>>>>> usage or Kafka consumer rate. It needs additional resource monitor >>>>>>>> features >>>>>>>> (such as Horizontal Pod Autoscaler, or else). Is this correct? >>>>>>>> 2. Is it possible to create a custom resource monitor provider >>>>>>>> application? For example, if I want to increase/decrease parallelism by >>>>>>>> Kafka consumer rate, do I need to send specific API from outside, to >>>>>>>> order >>>>>>>> rescaling? >>>>>>>> 3. If 2 is correct, what is the difference when using 'reactive' >>>>>>>> mode? Because as far as I think, calling a specific API will rescale >>>>>>>> either >>>>>>>> using 'reactive' mode or not...(or is the API just working based on >>>>>>>> this >>>>>>>> mode)? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>>