Hi Prasanna,

IIUC, your screenshot shows the scaling feature of an EMR cluster, not
Flink.

Let me try to better understand your question. Which kind of rescaling do
you need?
- If you deploy a long running streaming job, and want it to dynamically
rescale based on the real-time incoming data stream. Flink does not support
it at the moment.
- If you have various jobs, and the amount of jobs need to be executed
changes along the time, this can be supported in either of the following
ways.
  - You can submit your workloads as Flink Single Jobs[1]. In this way, you
can simply rescale your EMR cluster, and Flink does not need to be aware of
that.
  - You can deploy a Flink Session[2], and submit your jobs to this
session. In this way, Flink will automatically request new workers from and
release idle workers to Yarn.

AFAIK, AWS EMR provides out-of-box Flink integration only for the single
job mode. The session mode is not supported. But I haven't checked this for
quite a while. It could have been changed.

Thank you~

Xintong Song


[1]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#flink-yarn-session

On Mon, Jun 15, 2020 at 11:55 AM Prasanna kumar <
prasannakumarram...@gmail.com> wrote:

> Thanks Xintong and Yu Yang for the replies,
>
> I see AWS provides deploying Flink on EMR out of the box. There they have
> an option of EMR cluster scaling based on the load.
>
> Is this not equal to dynamic rescaling ?
>
> [image: Screen Shot 2020-06-15 at 9.23.24 AM.png]
>
>
>
> https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-scaling.html
>
> Let me know your thoughts on the same.
>
> Prasanna.
>
>
> On Wed, Jun 10, 2020 at 7:33 AM Xintong Song <tonysong...@gmail.com>
> wrote:
>
>> Hi Prasanna,
>>
>> Flink does not support dynamic rescaling at the moment.
>>
>> AFAIK, there are some companies in China already have solutions for
>> dynamic scaling Flink jobs (Alibaba, 360, etc.), but none of them are yet
>> available to the community version. These solutions rely on an external
>> system to monitor the workload and rescale the job accordingly. In case of
>> rescaling, it requires a full stop of the data processing, then rescale,
>> then recover from the most recent checkpoint.
>>
>> The Flink community is also preparing a declarative resource management
>> approach, which should allow the job to dynamically adapt to the available
>> resources (e.g., add/reduce pods on kubernetes). AFAIK, this is still in
>> the design discussion.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>>
>> On Wed, Jun 10, 2020 at 2:44 AM Prasanna kumar <
>> prasannakumarram...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Does flink support dynamic scaling. Say try to add/reduce nodes based
>>> upon incoming load.
>>>
>>> Because our use case is such that we get peak loads for 4 hours and then
>>> medium loads for 8 hours and then light to no load for rest 2 hours.
>>>
>>> Or peak load would be atleast 5 times the medium load.
>>>
>>> Has anyone used flink in these type of scenario? We are looking at flink
>>> for it's low latency performance.
>>>
>>> Earlier I worked with Spark+YARN which provides a features to dynamicaly
>>> add/reduce executors.
>>>
>>> Wanted to know the same on flink.
>>>
>>> Thanks,
>>> Prasanna
>>>
>>

Reply via email to