Hello Leon,
As described by Chen below Adaptive Scheduler doesn't perform auto scale a
Flink Job other than allocating the requested slots based on availability.
Recently we implemented this with EMR managed scaling by combining adaptive
scheduler since there's no direct support of auto scaling on yarn at Flink.
If you are running an application on infrastructure similar to AWS EMR, you can
use scaling policies to scale up the cluster and scale down the cluster based
on requested slots but it wont really work with incoming traffic since there's
no way of adjusting flink parallelism based on incoming traffic.
Regards,Madan
On Wednesday, 28 June 2023 at 08:43:22 am GMT-7, Chen Zhanghao
<[email protected]> wrote:
Hi Leon,
Adaptive scheduler alone cannot autoscale a Flink job. It simply adjusts the
parallelism of a job based on available slots [1]. To autoscale a job, we
further need a policy to suggest the recommended resources for the job and a
mechanism to adjust the allocated resources of the job (aka. available
slots).For K8s standalone application mode, we can use reactive mode coupled
with K8s HPA, where HPA collects pod metrics and autoscales the number of TMs,
and adaptive scheduler rescales job according to the available slots. For YARN
application mode, reactive mode is not available. However, in the coming 1.18
release, we can declare the desired resources through REST API to adjust the
allocated resources of the job via FLIP-291 [2], but you still need a policy to
suggest the recommended resources for the job and call the API, which you can
refer to the autoscaler implemention in Flink K8s operator.
[1] Elastic Scaling | Apache Flink[2] FLIP-291: Externalized Declarative
Resource Management - Apache Flink - Apache Software Foundation[3] Autoscaler |
Apache Flink Kubernetes Operator
Best,Zhanghao Chen发件人: Leon Xu <[email protected]>
发送时间: 2023年6月27日 13:41
收件人: user <[email protected]>
主题: Questions regarding adaptive scheduler with YARN and application mode Hi
Flink users,
I am trying to use Adaptive Scheduler to auto scale our Flink streaming jobs
(NOT batch job). Our jobs are running on YARN with application mode. There
isn't much doc around how adaptive scheduler works. So I have some questions:
- How does Adaptive Scheduler work with YARN/Application mode? If the
scheduler decides to request more tasks will it trigger the request to YARN
while the job is already running
- What's the evaluation criteria to trigger a scale-up ? Is it possible to
manually trigger a scale-up for testing purposes?
Thanks