Re: About Native Deployment's Autoscaling implementation

Gyula Fóra Tue, 24 May 2022 18:03:54 -0700

Hi Talat!

It would be great to have a HPA that works based on some flink
throughput/backlog metrics. I wonder how you are going to access the Flink
metrics in the HPA, we might need some integration with the k8s metrics
system.
In any case whether we need a FLIP or not depends on the complexity, if
it's simple then we can go without a FLIP.


Cheers,
Gyula

On Tue, May 24, 2022 at 12:26 PM Talat Uyarer <tuya...@paloaltonetworks.com>
wrote:

> Hi Gyula,
>
> This seems very promising for initial scaling. We are using Flink
> Kubernetes Operators. Most probably we are very early adapters for it :)
> Let me try it. Get back to you soon.
>
> My plan is building a general purpose CPU and backlog/throughput base
> autoscaling for Flink. I can create a Custom Open Source HPA on top of your
> changes. Do I need to create a FLIP for it ?
>
> Just general information about us Today we use another execution env.  if
> the Job scheduler does not support autoscaling. Having a HPA works if your
> sources are well balanced. If there is uneven distribution on sources,
> Having auto scaling feature on scheduler can help better utilization. But
> this is not urgent. We can start using your PR at least for a while.
>
> Thanks
>
> On Mon, May 23, 2022 at 4:10 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
>
>> Hi Talat!
>>
>> One other approach that we are investigating currently is combining the Flink
>> Kubernetes Operator
>> <https://urldefense.com/v3/__https://github.com/apache/flink-kubernetes-operator__;!!Mt_FR42WkD9csi9Y!ZNBiCduZFUmuQI7_9M48gQnkxBkrLEVOIPRWZY0ad0xmltbQ6G2stlfsiw6q9bGi5fctVF2RS1YNL5EkV9nZZNwfcA$>
>>  with
>> the K8S scaling capabilities (Horizontal Pod autoscaler)
>>
>> In this approach the HPA monitors the Taskmanager pods directly and can
>> modify the FlinkDeployment resource replica number to trigger a stateful
>> job scale-up/down through the operator.
>> Obviously not as nice as the reactive mode but it works with the current
>> Kubernetes Native implementation easily. It is also theoretically possible
>> to integrate this with other custom Flink metrics but we haven't tested yet.
>>
>> I have a created a POC pull request that showcases these capabilities:
>> https://github.com/apache/flink-kubernetes-operator/pull/227
>> <https://urldefense.com/v3/__https://github.com/apache/flink-kubernetes-operator/pull/227__;!!Mt_FR42WkD9csi9Y!ZNBiCduZFUmuQI7_9M48gQnkxBkrLEVOIPRWZY0ad0xmltbQ6G2stlfsiw6q9bGi5fctVF2RS1YNL5EkV9nKHxgshA$>
>>
>> If you are interested it would be nice if you could check it out and
>> provide feedback, we will get back to refining this after our current
>> ongoing release.
>>
>> Cheers,
>> Gyula
>>
>> On Mon, May 23, 2022 at 12:23 AM David Morávek <d...@apache.org> wrote:
>>
>>> Hi Talat,
>>>
>>> This is definitely an interesting and rather complex topic.
>>>
>>> Few unstructured thoughts / notes / questions:
>>>
>>> - The main struggle has always been that it's hard to come up with a
>>> generic one-size-fits-it-all metrics for autoscaling.
>>>   - Flink doesn't have knowledge of the external environment (eg.
>>> capacity
>>> planning on the cluster, no notion of pre-emption), so it can not really
>>> make a qualified decision in some cases.
>>>   - ^ the above goes along the same reasoning as why we don't support
>>> reactive mode with the session cluster (multi-job scheduling)
>>> - The re-scaling decision logic most likely needs to be pluggable from
>>> the
>>> above reasons
>>>   - We're in general fairly concerned about running any user code in JM
>>> for
>>> stability reasons.
>>>   - The most flexible option would be allowing to set the desired
>>> parallelism via rest api and leave the scaling decision to an external
>>> process, which could be reused for both standalone and "active"
>>> deployment
>>> modes (there is actually a prototype by Till, that allows this [1])
>>>
>>> How do you intend to make an autoscaling decision? Also note that the
>>> re-scaling is still a fairly expensive operation (especially with large
>>> state), so you need to make sure autoscaler doesn't oscillate and doesn't
>>> re-scale too often (this is also something that could vary from workload
>>> to
>>> workload).
>>>
>>> Note on the metrics question with an auto-scaler living in the JM:
>>> - We shouldn't really collect the metrics into the JM, but instead JM can
>>> pull then from TMs directly on-demand (basically the same thing and
>>> external auto-scaler would do).
>>>
>>> Looking forward to your thoughts
>>>
>>> [1] https://github.com/tillrohrmann/flink/commits/autoscaling
>>> <https://urldefense.com/v3/__https://github.com/tillrohrmann/flink/commits/autoscaling__;!!Mt_FR42WkD9csi9Y!ZNBiCduZFUmuQI7_9M48gQnkxBkrLEVOIPRWZY0ad0xmltbQ6G2stlfsiw6q9bGi5fctVF2RS1YNL5EkV9khvi_Fng$>
>>>
>>> Best,
>>> D.
>>>
>>> On Mon, May 23, 2022 at 8:32 AM Talat Uyarer <
>>> tuya...@paloaltonetworks.com>
>>> wrote:
>>>
>>> > Hi,
>>> > I am working on auto scaling support for native deployments. Today
>>> Flink
>>> > provides Reactive mode however it only runs on standalone deployments.
>>> We
>>> > use Kubernetes native deployment. So I want to increase or decrease job
>>> > resources for our streamin jobs. Recent Flip-138 and Flip-160 are very
>>> > useful to achieve this goal. I started reading code of Flink
>>> JobManager,
>>> > AdaptiveScheduler and DeclarativeSlotPool etc.
>>> >
>>> > My assumption is Required Resources will be calculated on
>>> AdaptiveScheduler
>>> > whenever the scheduler receives a heartbeat from a task manager by
>>> calling
>>> > public void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot)
>>> > method.
>>> >
>>> > I checked TaskExecutorToJobManagerHeartbeatPayload class however I
>>> only see
>>> > *accumulatorReport* and *executionDeploymentReport* . Do you have any
>>> > suggestions to collect metrics from TaskManagers ? Should I add
>>> metrics on
>>> > TaskExecutorToJobManagerHeartbeatPayload ?
>>> >
>>> > I am open to another suggestion for this. Whenever I finalize my
>>> > investigation. I will create a FLIP for more detailed implementation.
>>> >
>>> > Thanks for your help in advance.
>>> > Talat
>>> >
>>>
>>

Re: About Native Deployment's Autoscaling implementation

Reply via email to