[ 
https://issues.apache.org/jira/browse/FLINK-38116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089174#comment-18089174
 ] 

Dennis-Mircea Ciupitu commented on FLINK-38116:
-----------------------------------------------

The root cause (fabric8 + the Kubernetes 1.31/1.32 {{RequestWatchProgress}} 
storage feature) is right, but the error comes from the client bundled inside 
{{flink-kubernetes}} and running inside the JobManager 
({{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}} in the stack, used by 
{{KubernetesHaServices}} for the leader-election watch), not the operator's own 
fabric8.

Two things worth noting since this was opened:
 - The operator's own fabric8 has since been bumped from {{6.13.2}} (1.12.0) to 
{{7.3.1}} on main, past the {{7.1.0}} k8s-1.32 floor, so the operator process 
itself supports 1.32.
 - The JobManager's client version is whatever the Flink runtime image bundles 
(Flink 1.20 ships an older 6.x), and the operator's {{fabric8.version}} does 
not affect it. So this can only be resolved by running a Flink image with a 
newer bundled fabric8 (a Flink-core upgrade), or by enabling the 
{{RequestWatchProgress}} feature on the apiserver.

Could you confirm which Flink image/version the JobManager runs, and whether HA 
actually breaks or this is ERROR-log noise? If it's purely the JM-side client, 
this likely belongs in a Flink core ticket rather than the operator.

> Flink kubernetes operator RequestWatchProgress Error
> ----------------------------------------------------
>
>                 Key: FLINK-38116
>                 URL: https://issues.apache.org/jira/browse/FLINK-38116
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: 1.12.0
>            Reporter: David Ballano
>            Assignee: Diljeet Singh
>            Priority: Major
>
> Hi guys,
> Our Kubernetes cluster has been upgraded to version 1.32, and since then 
> we've been encountering the following error in the job-manager:
> {code:java}
> {"instant":{"epochSecond":1752771773,"nanoOfSecond":610256000},"thread":"OkHttp
>  
> https://100.64.0.1/...","level":"ERROR","loggerName":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager","message":"Error
>  received: Status(apiVersion=v1, code=500, details=null, kind=Status, 
> message=a watch stream was requested by the client but the required storage 
> feature RequestWatchProgress is disabled, metadata=ListMeta(_continue=null, 
> remainingItemCount=null, resourceVersion=null, selfLink=null, 
> additionalProperties={}), reason=InternalError, status=Failure, 
> additionalProperties={}), will 
> retry","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":82,"threadPriority":5}{code}
> After some investigation, it appears the Kubernetes client library used by 
> the Flink operator may be outdated and incompatible with Kubernetes 1.32?
> Reference:
> https://github.com/apache/flink-kubernetes-operator/blob/release-1.12.0/pom.xml#L81
> Support for Kubernetes 1.32 was introduced in client version 7.1.0.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to