Hi,
I am working on auto scaling support for native deployments. Today Flink
provides Reactive mode however it only runs on standalone deployments. We
use Kubernetes native deployment. So I want to increase or decrease job
resources for our streamin jobs. Recent Flip-138 and Flip-160 are very
useful to achieve this goal. I started reading code of Flink JobManager,
AdaptiveScheduler and DeclarativeSlotPool etc.

My assumption is Required Resources will be calculated on AdaptiveScheduler
whenever the scheduler receives a heartbeat from a task manager by calling
public void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot)
method.

I checked TaskExecutorToJobManagerHeartbeatPayload class however I only see
*accumulatorReport* and *executionDeploymentReport* . Do you have any
suggestions to collect metrics from TaskManagers ? Should I add metrics on
TaskExecutorToJobManagerHeartbeatPayload ?

I am open to another suggestion for this. Whenever I finalize my
investigation. I will create a FLIP for more detailed implementation.

Thanks for your help in advance.
Talat

Reply via email to