[jira] [Commented] (FLINK-36557) Stale Autoscaler Context in Kubernetes Operator

Sai Sharath Dandi (Jira) Wed, 16 Oct 2024 14:59:12 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-36557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17890258#comment-17890258
 ]


Sai Sharath Dandi commented on FLINK-36557:
-------------------------------------------

We're trying to build an autoscaler solution for YARN following the same 
pattern as Kubernetes and observed this problem where applications may not get 
scaled sometimes. I think the same problem applies to the kubernetes-operator 
unless I missed something in the code. 

> Stale Autoscaler Context in Kubernetes Operator
> -----------------------------------------------
>
>                 Key: FLINK-36557
>                 URL: https://issues.apache.org/jira/browse/FLINK-36557
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Sai Sharath Dandi
>            Priority: Minor
>
> The KubernetesJobAutoScalerContext is 
> [cached|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkResourceContext.java#L59]
>  in the FlinkResourceContext and reused. If the JobAutoscalerContext is 
> initialized before the job reaches Running state, it can cause the autoscaler 
> to not trigger - 
> [link|[https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/JobAutoScalerImpl.java#L98].]
>  
> We need to either refresh the AutoScalerContext similar to the standalone 
> [implementation|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-autoscaler-standalone/src/main/java/org/apache/flink/autoscaler/standalone/StandaloneAutoscalerExecutor.java#L127]
>  or the autoscaler module itself needs to refresh the job status



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36557) Stale Autoscaler Context in Kubernetes Operator

Reply via email to