Hi Dennis,

  1.  In Flink 1.18 + non-reactive mode, autoscaler adjusts the job's 
parallelism and the job will request for extra TMs if the current ones cannot 
satisfy its need and redundant TMs will be released automatically later for 
being idle. In other words, parallelism changes cause TM number change.
  2.  The core metrics used is busy time (the amount of time spent on task 
processing per 1 second = 1 s - backpressured time - idle time), it is 
considered to be superior as it counts I/O cost etc into account as well. Also, 
the metrics is on a per-task granularity and allows us to identify bottleneck 
tasks.
  3.  Autoscaler feature currently only works for K8s opeartor + native K8s 
mode.

Best,
Zhanghao Chen
________________________________
发件人: Dennis Jung <inylov...@gmail.com>
发送时间: 2023年9月2日 12:58
收件人: Gyula Fóra <gyula.f...@gmail.com>
抄送: user@flink.apache.org <user@flink.apache.org>
主题: Re: [Question] How to scale application based on 'reactive' mode

Hello,
Thanks for your notice.

1. In "Flink 1.18 + non-reactive", is parallelism being changed by the number 
of TM?
2. In the 
document(https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/),
 it said "we are not using any container memory / CPU utilization metrics 
directly here". Which metrics are these using internally?
3. I'm using standalone 
k8s(https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/)
 for deployment. Is autoscaler features only available by using the "flink k8s 
operator"(sorry I don't understand this clearly yet...)?

Regards


2023년 9월 1일 (금) 오후 10:20, Gyula Fóra 
<gyula.f...@gmail.com<mailto:gyula.f...@gmail.com>>님이 작성:
Pretty much, except that with Flink 1.18 autoscaler can scale the job in place 
without restarting the JM (even without reactive mode )

So actually best option is autoscaler with Flink 1.18 native mode (no reactive)

Gyula

On Fri, 1 Sep 2023 at 13:54, Dennis Jung 
<inylov...@gmail.com<mailto:inylov...@gmail.com>> wrote:
Thanks for feedback.
Could you check whether I understand correctly?

Only using 'reactive' mode:
By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh 
start'), parallelism will be increased. For example, when job parallelism is 1 
and TM is 1, and if adding 1 new TM, JobManager will be restarted and 
parallelism will be 2.
But the number of TM is not being controlled automatically.

Autoscaler + non-reactive:
It can flexibilly control the number of TM by several metrics(CPU usage, 
throughput, ...), and JobManager will be restarted when scaling. But job 
parallelism is the same after the number of TM has been changed.

Autoscaler + 'reactive' mode:
It can control numbers of TM by metric, and increase/decrease job parallelism 
by changing TM.

Regards,
Jung

2023년 9월 1일 (금) 오후 8:16, Gyula Fóra 
<gyula.f...@gmail.com<mailto:gyula.f...@gmail.com>>님이 작성:
I would look at reactive scaling as a way to increase / decrease parallelism.

It’s not a way to automatically decide when to actually do it as you need to 
create new TMs .

The autoscaler could use reactive mode to change the parallelism but you need 
the autoscaler itself to decide when new resources should be added

On Fri, 1 Sep 2023 at 13:09, Dennis Jung 
<inylov...@gmail.com<mailto:inylov...@gmail.com>> wrote:
For now, the thing I've found about 'reactive' mode is that it automatically 
adjusts 'job parallelism' when TaskManager is increased/decreased.

https://www.slideshare.net/FlinkForward/autoscaling-flink-with-reactive-mode

Is there some other feature that only 'reactive' mode offers for scaling?

Thanks.
Regards.



2023년 9월 1일 (금) 오후 4:56, Dennis Jung 
<inylov...@gmail.com<mailto:inylov...@gmail.com>>님이 작성:
Hello,
Thank you for your response. I have few more questions in following: 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/

Reactive Mode configures a job so that it always uses all resources available 
in the cluster. Adding a TaskManager will scale up your job, removing resources 
will scale it down. Flink will manage the parallelism of the job, always 
setting it to the highest possible values.
=> Does this mean when I add/remove TaskManager in 'non-reactive' mode, 
resource(CPU/Memory/Etc.) of the cluster is not being changed?

Reactive Mode restarts a job on a rescaling event, restoring it from the latest 
completed checkpoint. This means that there is no overhead of creating a 
savepoint (which is needed for manually rescaling a job). Also, the amount of 
data that is reprocessed after rescaling depends on the checkpointing interval, 
and the restore time depends on the state size.
=> As I know 'rescaling' also works in non-reactive mode, with restoring 
checkpoint. What is the difference of using 'reactive' here?

The Reactive Mode allows Flink users to implement a powerful autoscaling 
mechanism, by having an external service monitor certain metrics, such as 
consumer lag, aggregate CPU utilization, throughput or latency. As soon as 
these metrics are above or below a certain threshold, additional TaskManagers 
can be added or removed from the Flink cluster.
=> Why is this only possible in 'reactive' mode? Seems this is more related to 
'autoscaler'. Are there some specific features/API which can control 
TaskManager/Parallelism only in 'reactive' mode?

Thank you.

2023년 9월 1일 (금) 오후 3:30, Gyula Fóra 
<gyula.f...@gmail.com<mailto:gyula.f...@gmail.com>>님이 작성:
The reactive mode reacts to available resources. The autoscaler reacts to 
changing load and processing capacity and adjusts resources.

Completely different concepts and applicability.
Most people want the autoscaler , but this is a recent feature and is specific 
to the k8s operator at the moment.

Gyula

On Fri, 1 Sep 2023 at 04:50, Dennis Jung 
<inylov...@gmail.com<mailto:inylov...@gmail.com>> wrote:
Hello,
Thanks for your notice.

Than what is the purpose of using 'reactive', if this doesn't do anything 
itself?
What is the difference if I use auto-scaler without 'reactive' mode?

Regards,
Jung



2023년 8월 18일 (금) 오후 7:51, Gyula Fóra 
<gyula.f...@gmail.com<mailto:gyula.f...@gmail.com>>님이 작성:
Hi!

I think what you need is probably not the reactive mode but a proper 
autoscaler. The reactive mode as you say doesn't do anything in itself, you 
need to build a lot of logic around it.

Check this instead: 
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/

The Kubernetes Operator has a built in autoscaler that can scale jobs based on 
kafka data rate / processing throughput. It also doesn't rely on the reactive 
mode.

Cheers,
Gyula

On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung 
<inylov...@gmail.com<mailto:inylov...@gmail.com>> wrote:
Hello,
Sorry for frequent questions. This is a question about 'reactive' mode.

1. As far as I understand, though I've setup `scheduler-mode: reactive`, it 
will not change parallelism automatically by itself, by CPU usage or Kafka 
consumer rate. It needs additional resource monitor features (such as 
Horizontal Pod Autoscaler, or else). Is this correct?
2. Is it possible to create a custom resource monitor provider application? For 
example, if I want to increase/decrease parallelism by Kafka consumer rate, do 
I need to send specific API from outside, to order rescaling?
3. If 2 is correct, what is the difference when using 'reactive' mode? Because 
as far as I think, calling a specific API will rescale either using 'reactive' 
mode or not...(or is the API just working based on this mode)?

Thanks.

Regards

Reply via email to