Jung, I don't want to sound unhelpful, but I think the best thing for you to do is simply to try these different models in your local env. It should be very easy to get started with the Kubernetes Operator on Kind/Minikube ( https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/ )
It's very difficult to answer these questions fully here. Try the different modes, observe what happens, read the docs and you will get all the answers. Gyula On Thu, Sep 7, 2023 at 10:11 AM Dennis Jung <inylov...@gmail.com> wrote: > Hello Chen, > Thanks for your reply! I have further questions as following... > > 1. In case of non-reactive mode in Flink 1.18, if the autoscaler adjusts > parallelism, what is the difference by using 'reactive' mode? > 2. In case if I use Flink 1.15~1.17 without autoscaler, is the difference > of using 'reactive' mode is, changing parallelism dynamically by change of > TM number (manually, or by custom scaler)? > > Regards, > Jung > > > 2023년 9월 5일 (화) 오후 3:59, Chen Zhanghao <zhanghao.c...@outlook.com>님이 작성: > >> Hi Dennis, >> >> >> 1. In Flink 1.18 + non-reactive mode, autoscaler adjusts the job's >> parallelism and the job will request for extra TMs if the current ones >> cannot satisfy its need and redundant TMs will be released automatically >> later for being idle. In other words, parallelism changes cause TM number >> change. >> 2. The core metrics used is busy time (the amount of time spent on >> task processing per 1 second = 1 s - backpressured time - idle time), it >> is >> considered to be superior as it counts I/O cost etc into account as well. >> Also, the metrics is on a per-task granularity and allows us to identify >> bottleneck tasks. >> 3. Autoscaler feature currently only works for K8s opeartor + native >> K8s mode. >> >> >> Best, >> Zhanghao Chen >> ------------------------------ >> *发件人:* Dennis Jung <inylov...@gmail.com> >> *发送时间:* 2023年9月2日 12:58 >> *收件人:* Gyula Fóra <gyula.f...@gmail.com> >> *抄送:* user@flink.apache.org <user@flink.apache.org> >> *主题:* Re: [Question] How to scale application based on 'reactive' mode >> >> Hello, >> Thanks for your notice. >> >> 1. In "Flink 1.18 + non-reactive", is parallelism being changed by the >> number of TM? >> 2. In the document( >> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/), >> it said "we are not using any container memory / CPU utilization metrics >> directly here". Which metrics are these using internally? >> 3. I'm using standalone k8s( >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/) >> for deployment. Is autoscaler features only available by using the "flink >> k8s operator"(sorry I don't understand this clearly yet...)? >> >> Regards >> >> >> 2023년 9월 1일 (금) 오후 10:20, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >> >> Pretty much, except that with Flink 1.18 autoscaler can scale the job in >> place without restarting the JM (even without reactive mode ) >> >> So actually best option is autoscaler with Flink 1.18 native mode (no >> reactive) >> >> Gyula >> >> On Fri, 1 Sep 2023 at 13:54, Dennis Jung <inylov...@gmail.com> wrote: >> >> Thanks for feedback. >> Could you check whether I understand correctly? >> >> *Only using 'reactive' mode:* >> By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh >> start'), parallelism will be increased. For example, when job parallelism >> is 1 and TM is 1, and if adding 1 new TM, JobManager will be restarted and >> parallelism will be 2. >> But the number of TM is not being controlled automatically. >> >> *Autoscaler + non-reactive:* >> It can flexibilly control the number of TM by several metrics(CPU usage, >> throughput, ...), and JobManager will be restarted when scaling. But job >> parallelism is the same after the number of TM has been changed. >> >> *Autoscaler + 'reactive' mode*: >> It can control numbers of TM by metric, and increase/decrease job >> parallelism by changing TM. >> >> Regards, >> Jung >> >> 2023년 9월 1일 (금) 오후 8:16, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >> >> I would look at reactive scaling as a way to increase / decrease >> parallelism. >> >> It’s not a way to automatically decide when to actually do it as you need >> to create new TMs . >> >> The autoscaler could use reactive mode to change the parallelism but you >> need the autoscaler itself to decide when new resources should be added >> >> On Fri, 1 Sep 2023 at 13:09, Dennis Jung <inylov...@gmail.com> wrote: >> >> For now, the thing I've found about 'reactive' mode is that it >> automatically adjusts 'job parallelism' when TaskManager is >> increased/decreased. >> >> >> https://www.slideshare.net/FlinkForward/autoscaling-flink-with-reactive-mode >> >> Is there some other feature that only 'reactive' mode offers for scaling? >> >> Thanks. >> Regards. >> >> >> >> 2023년 9월 1일 (금) 오후 4:56, Dennis Jung <inylov...@gmail.com>님이 작성: >> >> Hello, >> Thank you for your response. I have few more questions in following: >> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/ >> >> *Reactive Mode configures a job so that it always uses all resources >> available in the cluster. Adding a TaskManager will scale up your job, >> removing resources will scale it down. Flink will manage the parallelism of >> the job, always setting it to the highest possible values.* >> => Does this mean when I add/remove TaskManager in 'non-reactive' mode, >> resource(CPU/Memory/Etc.) of the cluster is not being changed? >> >> *Reactive Mode restarts a job on a rescaling event, restoring it from the >> latest completed checkpoint. This means that there is no overhead of >> creating a savepoint (which is needed for manually rescaling a job). Also, >> the amount of data that is reprocessed after rescaling depends on the >> checkpointing interval, and the restore time depends on the state size.* >> => As I know 'rescaling' also works in non-reactive mode, with restoring >> checkpoint. What is the difference of using 'reactive' here? >> >> *The Reactive Mode allows Flink users to implement a powerful autoscaling >> mechanism, by having an external service monitor certain metrics, such as >> consumer lag, aggregate CPU utilization, throughput or latency. As soon as >> these metrics are above or below a certain threshold, additional >> TaskManagers can be added or removed from the Flink cluster.* >> => Why is this only possible in 'reactive' mode? Seems this is more >> related to 'autoscaler'. Are there some specific features/API which can >> control TaskManager/Parallelism only in 'reactive' mode? >> >> Thank you. >> >> 2023년 9월 1일 (금) 오후 3:30, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >> >> The reactive mode reacts to available resources. The autoscaler reacts to >> changing load and processing capacity and adjusts resources. >> >> Completely different concepts and applicability. >> Most people want the autoscaler , but this is a recent feature and is >> specific to the k8s operator at the moment. >> >> Gyula >> >> On Fri, 1 Sep 2023 at 04:50, Dennis Jung <inylov...@gmail.com> wrote: >> >> Hello, >> Thanks for your notice. >> >> Than what is the purpose of using 'reactive', if this doesn't do anything >> itself? >> What is the difference if I use auto-scaler without 'reactive' mode? >> >> Regards, >> Jung >> >> >> >> 2023년 8월 18일 (금) 오후 7:51, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >> >> Hi! >> >> I think what you need is probably not the reactive mode but a proper >> autoscaler. The reactive mode as you say doesn't do anything in itself, you >> need to build a lot of logic around it. >> >> Check this instead: >> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/ >> >> The Kubernetes Operator has a built in autoscaler that can scale jobs >> based on kafka data rate / processing throughput. It also doesn't rely on >> the reactive mode. >> >> Cheers, >> Gyula >> >> On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung <inylov...@gmail.com> wrote: >> >> Hello, >> Sorry for frequent questions. This is a question about 'reactive' mode. >> >> 1. As far as I understand, though I've setup `scheduler-mode: reactive`, >> it will not change parallelism automatically by itself, by CPU usage or >> Kafka consumer rate. It needs additional resource monitor features (such as >> Horizontal Pod Autoscaler, or else). Is this correct? >> 2. Is it possible to create a custom resource monitor provider >> application? For example, if I want to increase/decrease parallelism by >> Kafka consumer rate, do I need to send specific API from outside, to order >> rescaling? >> 3. If 2 is correct, what is the difference when using 'reactive' mode? >> Because as far as I think, calling a specific API will rescale either using >> 'reactive' mode or not...(or is the API just working based on this mode)? >> >> Thanks. >> >> Regards >> >>