Re: [Question] How to scale application based on 'reactive' mode

Dennis Jung Fri, 01 Sep 2023 21:58:50 -0700

Hello,
Thanks for your notice.

1. In "Flink 1.18 + non-reactive", is parallelism being changed by the
number of TM?
2. In the document(
https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-release-1.6/docs/custom-resource/autoscaler/),
it said "we are not using any container memory / CPU utilization metrics
directly here". Which metrics are these using internally?
3. I'm using standalone k8s(
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/kubernetes/)
for deployment. Is autoscaler features only available by using the "flink
k8s operator"(sorry I don't understand this clearly yet...)?


Regards


2023년 9월 1일 (금) 오후 10:20, Gyula Fóra <gyula.f...@gmail.com>님이 작성:

> Pretty much, except that with Flink 1.18 autoscaler can scale the job in
> place without restarting the JM (even without reactive mode )
>
> So actually best option is autoscaler with Flink 1.18 native mode (no
> reactive)
>
> Gyula
>
> On Fri, 1 Sep 2023 at 13:54, Dennis Jung <inylov...@gmail.com> wrote:
>
>> Thanks for feedback.
>> Could you check whether I understand correctly?
>>
>> *Only using 'reactive' mode:*
>> By manually adding TaskManager(TM) (such as using './bin/taskmanager.sh
>> start'), parallelism will be increased. For example, when job parallelism
>> is 1 and TM is 1, and if adding 1 new TM, JobManager will be restarted and
>> parallelism will be 2.
>> But the number of TM is not being controlled automatically.
>>
>> *Autoscaler + non-reactive:*
>> It can flexibilly control the number of TM by several metrics(CPU usage,
>> throughput, ...), and JobManager will be restarted when scaling. But job
>> parallelism is the same after the number of TM has been changed.
>>
>> *Autoscaler + 'reactive' mode*:
>> It can control numbers of TM by metric, and increase/decrease job
>> parallelism by changing TM.
>>
>> Regards,
>> Jung
>>
>> 2023년 9월 1일 (금) 오후 8:16, Gyula Fóra <gyula.f...@gmail.com>님이 작성:
>>
>>> I would look at reactive scaling as a way to increase / decrease
>>> parallelism.
>>>
>>> It’s not a way to automatically decide when to actually do it as you
>>> need to create new TMs .
>>>
>>> The autoscaler could use reactive mode to change the parallelism but you
>>> need the autoscaler itself to decide when new resources should be added
>>>
>>> On Fri, 1 Sep 2023 at 13:09, Dennis Jung <inylov...@gmail.com> wrote:
>>>
>>>> For now, the thing I've found about 'reactive' mode is that it
>>>> automatically adjusts 'job parallelism' when TaskManager is
>>>> increased/decreased.
>>>>
>>>>
>>>> https://www.slideshare.net/FlinkForward/autoscaling-flink-with-reactive-mode
>>>>
>>>> Is there some other feature that only 'reactive' mode offers for
>>>> scaling?
>>>>
>>>> Thanks.
>>>> Regards.
>>>>
>>>>
>>>>
>>>> 2023년 9월 1일 (금) 오후 4:56, Dennis Jung <inylov...@gmail.com>님이 작성:
>>>>
>>>>> Hello,
>>>>> Thank you for your response. I have few more questions in following:
>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/elastic_scaling/
>>>>>
>>>>> *Reactive Mode configures a job so that it always uses all resources
>>>>> available in the cluster. Adding a TaskManager will scale up your job,
>>>>> removing resources will scale it down. Flink will manage the parallelism 
>>>>> of
>>>>> the job, always setting it to the highest possible values.*
>>>>> => Does this mean when I add/remove TaskManager in 'non-reactive'
>>>>> mode, resource(CPU/Memory/Etc.) of the cluster is not being changed?
>>>>>
>>>>> *Reactive Mode restarts a job on a rescaling event, restoring it from
>>>>> the latest completed checkpoint. This means that there is no overhead of
>>>>> creating a savepoint (which is needed for manually rescaling a job). Also,
>>>>> the amount of data that is reprocessed after rescaling depends on the
>>>>> checkpointing interval, and the restore time depends on the state size.*
>>>>> => As I know 'rescaling' also works in non-reactive mode, with
>>>>> restoring checkpoint. What is the difference of using 'reactive' here?
>>>>>
>>>>> *The Reactive Mode allows Flink users to implement a powerful
>>>>> autoscaling mechanism, by having an external service monitor certain
>>>>> metrics, such as consumer lag, aggregate CPU utilization, throughput or
>>>>> latency. As soon as these metrics are above or below a certain threshold,
>>>>> additional TaskManagers can be added or removed from the Flink cluster.*
>>>>> => Why is this only possible in 'reactive' mode? Seems this is more
>>>>> related to 'autoscaler'. Are there some specific features/API which can
>>>>> control TaskManager/Parallelism only in 'reactive' mode?
>>>>>
>>>>> Thank you.
>>>>>
>>>>> 2023년 9월 1일 (금) 오후 3:30, Gyula Fóra <gyula.f...@gmail.com>님이 작성:
>>>>>
>>>>>> The reactive mode reacts to available resources. The autoscaler
>>>>>> reacts to changing load and processing capacity and adjusts resources.
>>>>>>
>>>>>> Completely different concepts and applicability.
>>>>>> Most people want the autoscaler , but this is a recent feature and is
>>>>>> specific to the k8s operator at the moment.
>>>>>>
>>>>>> Gyula
>>>>>>
>>>>>> On Fri, 1 Sep 2023 at 04:50, Dennis Jung <inylov...@gmail.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>> Thanks for your notice.
>>>>>>>
>>>>>>> Than what is the purpose of using 'reactive', if this doesn't do
>>>>>>> anything itself?
>>>>>>> What is the difference if I use auto-scaler without 'reactive' mode?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Jung
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2023년 8월 18일 (금) 오후 7:51, Gyula Fóra <gyula.f...@gmail.com>님이 작성:
>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I think what you need is probably not the reactive mode but a
>>>>>>>> proper autoscaler. The reactive mode as you say doesn't do anything in
>>>>>>>> itself, you need to build a lot of logic around it.
>>>>>>>>
>>>>>>>> Check this instead:
>>>>>>>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/custom-resource/autoscaler/
>>>>>>>>
>>>>>>>> The Kubernetes Operator has a built in autoscaler that can scale
>>>>>>>> jobs based on kafka data rate / processing throughput. It also doesn't 
>>>>>>>> rely
>>>>>>>> on the reactive mode.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Gyula
>>>>>>>>
>>>>>>>> On Fri, Aug 18, 2023 at 12:43 PM Dennis Jung <inylov...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> Sorry for frequent questions. This is a question about 'reactive'
>>>>>>>>> mode.
>>>>>>>>>
>>>>>>>>> 1. As far as I understand, though I've setup `scheduler-mode:
>>>>>>>>> reactive`, it will not change parallelism automatically by itself, by 
>>>>>>>>> CPU
>>>>>>>>> usage or Kafka consumer rate. It needs additional resource monitor 
>>>>>>>>> features
>>>>>>>>> (such as Horizontal Pod Autoscaler, or else). Is this correct?
>>>>>>>>> 2. Is it possible to create a custom resource monitor provider
>>>>>>>>> application? For example, if I want to increase/decrease parallelism 
>>>>>>>>> by
>>>>>>>>> Kafka consumer rate, do I need to send specific API from outside, to 
>>>>>>>>> order
>>>>>>>>> rescaling?
>>>>>>>>> 3. If 2 is correct, what is the difference when using 'reactive'
>>>>>>>>> mode? Because as far as I think, calling a specific API will rescale 
>>>>>>>>> either
>>>>>>>>> using 'reactive' mode or not...(or is the API just working based on 
>>>>>>>>> this
>>>>>>>>> mode)?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>>

Re: [Question] How to scale application based on 'reactive' mode

Reply via email to