Hello, Thanks for feedback. I'll start with these. Regards
2023년 9월 7일 (목) 오후 7:08, Gyula Fóra <gyula.f...@gmail.com>님이 작성: > Jung, > I don't want to sound unhelpful, but I think the best thing for you to do > is simply to try these different models in your local env. > It should be very easy to get started with the Kubernetes Operator on > Kind/Minikube ( > https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/try-flink-kubernetes-operator/quick-start/ > ) > > It's very difficult to answer these questions fully here. Try the > different modes, observe what happens, read the docs and you will get all > the answers. > > Gyula > > On Thu, Sep 7, 2023 at 10:11 AM Dennis Jung <inylov...@gmail.com> wrote: > >> Hello Chen, >> Thanks for your reply! I have further questions as following... >> >> 1. In case of non-reactive mode in Flink 1.18, if the autoscaler adjusts >> parallelism, what is the difference by using 'reactive' mode? >> 2. In case if I use Flink 1.15~1.17 without autoscaler, is the difference >> of using 'reactive' mode is, changing parallelism dynamically by change of >> TM number (manually, or by custom scaler)? >> >> Regards, >> Jung >> >> >> 2023년 9월 5일 (화) 오후 3:59, Chen Zhanghao <zhanghao.c...@outlook.com>님이 작성: >> >>> Hi Dennis, >>> >>> >>> 1. In Flink 1.18 + non-reactive mode, autoscaler adjusts the job's >>> parallelism and the job will request for extra TMs if the current ones >>> cannot satisfy its need and redundant TMs will be released automatically >>> later for being idle. In other words, parallelism changes cause TM number >>> change. >>> 2. The core metrics used is busy time (the amount of time spent on >>> task processing per 1 second = 1 s - backpressured time - idle time), it >>> is >>> considered to be superior as it counts I/O cost etc into account as well. >>> Also, the metrics is on a per-task granularity and allows us to identify >>> bottleneck tasks. >>> 3. Autoscaler feature currently only works for K8s opeartor + native >>> K8s mode. >>> >>> >>> Best, >>> Zhanghao Chen >>> ------------------------------ >>> *发件人:* Dennis Jung <inylov...@gmail.com> >>> *发送时间:* 2023年9月2日 12:58 >>> *收件人:* Gyula Fóra <gyula.f...@gmail.com> >>> *抄送:* user@flink.apache.org <user@flink.apache.org> >>> *主题:* Re: [Question] How to scale application based on 'reactive' mode >>> >>> Hello, >>> Thanks for your notice. >>> >>> 1. In "Flink 1.18 + non-reactive", is parallelism being changed by the >>> number of TM? >>> 2. In the document( >>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/), >>> it said "we are not using any container memory / CPU utilization metrics >>> directly here". Which metrics are these using internally? >>> 3. I'm using standalone k8s( >>> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/) >>> for deployment. Is autoscaler features only available by using the "flink >>> k8s operator"(sorry I don't understand this clearly yet...)? >>> >>> Regards >>> >>> >>> 2023년 9월 1일 (금) 오후 10:20, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>> >>> Pretty much, except that with Flink 1.18 autoscaler can scale the job in >>> place without restarting the JM (even without reactive mode ) >>> >>> So actually best option is autoscaler with Flink 1.18 native mode (no >>> reactive) >>> >>> Gyula >>> >>> On Fri, 1 Sep 2023 at 13:54, Dennis Jung <inylov...@gmail.com> wrote: >>> >>> Thanks for feedback. >>> Could you check whether I understand correctly? >>> >>> *Only using 'reactive' mode:* >>> By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh >>> start'), parallelism will be increased. For example, when job parallelism >>> is 1 and TM is 1, and if adding 1 new TM, JobManager will be restarted and >>> parallelism will be 2. >>> But the number of TM is not being controlled automatically. >>> >>> *Autoscaler + non-reactive:* >>> It can flexibilly control the number of TM by several metrics(CPU usage, >>> throughput, ...), and JobManager will be restarted when scaling. But job >>> parallelism is the same after the number of TM has been changed. >>> >>> *Autoscaler + 'reactive' mode*: >>> It can control numbers of TM by metric, and increase/decrease job >>> parallelism by changing TM. >>> >>> Regards, >>> Jung >>> >>> 2023년 9월 1일 (금) 오후 8:16, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>> >>> I would look at reactive scaling as a way to increase / decrease >>> parallelism. >>> >>> It’s not a way to automatically decide when to actually do it as you >>> need to create new TMs . >>> >>> The autoscaler could use reactive mode to change the parallelism but you >>> need the autoscaler itself to decide when new resources should be added >>> >>> On Fri, 1 Sep 2023 at 13:09, Dennis Jung <inylov...@gmail.com> wrote: >>> >>> For now, the thing I've found about 'reactive' mode is that it >>> automatically adjusts 'job parallelism' when TaskManager is >>> increased/decreased. >>> >>> >>> https://www.slideshare.net/FlinkForward/autoscaling-flink-with-reactive-mode >>> >>> Is there some other feature that only 'reactive' mode offers for scaling? >>> >>> Thanks. >>> Regards. >>> >>> >>> >>> 2023년 9월 1일 (금) 오후 4:56, Dennis Jung <inylov...@gmail.com>님이 작성: >>> >>> Hello, >>> Thank you for your response. I have few more questions in following: >>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/ >>> >>> *Reactive Mode configures a job so that it always uses all resources >>> available in the cluster. Adding a TaskManager will scale up your job, >>> removing resources will scale it down. Flink will manage the parallelism of >>> the job, always setting it to the highest possible values.* >>> => Does this mean when I add/remove TaskManager in 'non-reactive' mode, >>> resource(CPU/Memory/Etc.) of the cluster is not being changed? >>> >>> *Reactive Mode restarts a job on a rescaling event, restoring it from >>> the latest completed checkpoint. This means that there is no overhead of >>> creating a savepoint (which is needed for manually rescaling a job). Also, >>> the amount of data that is reprocessed after rescaling depends on the >>> checkpointing interval, and the restore time depends on the state size.* >>> => As I know 'rescaling' also works in non-reactive mode, with restoring >>> checkpoint. What is the difference of using 'reactive' here? >>> >>> *The Reactive Mode allows Flink users to implement a powerful >>> autoscaling mechanism, by having an external service monitor certain >>> metrics, such as consumer lag, aggregate CPU utilization, throughput or >>> latency. As soon as these metrics are above or below a certain threshold, >>> additional TaskManagers can be added or removed from the Flink cluster.* >>> => Why is this only possible in 'reactive' mode? Seems this is more >>> related to 'autoscaler'. Are there some specific features/API which can >>> control TaskManager/Parallelism only in 'reactive' mode? >>> >>> Thank you. >>> >>> 2023년 9월 1일 (금) 오후 3:30, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>> >>> The reactive mode reacts to available resources. The autoscaler reacts >>> to changing load and processing capacity and adjusts resources. >>> >>> Completely different concepts and applicability. >>> Most people want the autoscaler , but this is a recent feature and is >>> specific to the k8s operator at the moment. >>> >>> Gyula >>> >>> On Fri, 1 Sep 2023 at 04:50, Dennis Jung <inylov...@gmail.com> wrote: >>> >>> Hello, >>> Thanks for your notice. >>> >>> Than what is the purpose of using 'reactive', if this doesn't do >>> anything itself? >>> What is the difference if I use auto-scaler without 'reactive' mode? >>> >>> Regards, >>> Jung >>> >>> >>> >>> 2023년 8월 18일 (금) 오후 7:51, Gyula Fóra <gyula.f...@gmail.com>님이 작성: >>> >>> Hi! >>> >>> I think what you need is probably not the reactive mode but a proper >>> autoscaler. The reactive mode as you say doesn't do anything in itself, you >>> need to build a lot of logic around it. >>> >>> Check this instead: >>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/ >>> >>> The Kubernetes Operator has a built in autoscaler that can scale jobs >>> based on kafka data rate / processing throughput. It also doesn't rely on >>> the reactive mode. >>> >>> Cheers, >>> Gyula >>> >>> On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung <inylov...@gmail.com> >>> wrote: >>> >>> Hello, >>> Sorry for frequent questions. This is a question about 'reactive' mode. >>> >>> 1. As far as I understand, though I've setup `scheduler-mode: reactive`, >>> it will not change parallelism automatically by itself, by CPU usage or >>> Kafka consumer rate. It needs additional resource monitor features (such as >>> Horizontal Pod Autoscaler, or else). Is this correct? >>> 2. Is it possible to create a custom resource monitor provider >>> application? For example, if I want to increase/decrease parallelism by >>> Kafka consumer rate, do I need to send specific API from outside, to order >>> rescaling? >>> 3. If 2 is correct, what is the difference when using 'reactive' mode? >>> Because as far as I think, calling a specific API will rescale either using >>> 'reactive' mode or not...(or is the API just working based on this mode)? >>> >>> Thanks. >>> >>> Regards >>> >>>