[
https://issues.apache.org/jira/browse/FLINK-38116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089174#comment-18089174
]
Dennis-Mircea Ciupitu commented on FLINK-38116:
-----------------------------------------------
The root cause (fabric8 + the Kubernetes 1.31/1.32 {{RequestWatchProgress}}
storage feature) is right, but the error comes from the client bundled inside
{{flink-kubernetes}} and running inside the JobManager
({{{}org.apache.flink.kubernetes.shaded.io.fabric8{}}} in the stack, used by
{{KubernetesHaServices}} for the leader-election watch), not the operator's own
fabric8.
Two things worth noting since this was opened:
- The operator's own fabric8 has since been bumped from {{6.13.2}} (1.12.0) to
{{7.3.1}} on main, past the {{7.1.0}} k8s-1.32 floor, so the operator process
itself supports 1.32.
- The JobManager's client version is whatever the Flink runtime image bundles
(Flink 1.20 ships an older 6.x), and the operator's {{fabric8.version}} does
not affect it. So this can only be resolved by running a Flink image with a
newer bundled fabric8 (a Flink-core upgrade), or by enabling the
{{RequestWatchProgress}} feature on the apiserver.
Could you confirm which Flink image/version the JobManager runs, and whether HA
actually breaks or this is ERROR-log noise? If it's purely the JM-side client,
this likely belongs in a Flink core ticket rather than the operator.
> Flink kubernetes operator RequestWatchProgress Error
> ----------------------------------------------------
>
> Key: FLINK-38116
> URL: https://issues.apache.org/jira/browse/FLINK-38116
> Project: Flink
> Issue Type: Bug
> Components: Kubernetes Operator
> Affects Versions: 1.12.0
> Reporter: David Ballano
> Assignee: Diljeet Singh
> Priority: Major
>
> Hi guys,
> Our Kubernetes cluster has been upgraded to version 1.32, and since then
> we've been encountering the following error in the job-manager:
> {code:java}
> {"instant":{"epochSecond":1752771773,"nanoOfSecond":610256000},"thread":"OkHttp
>
> https://100.64.0.1/...","level":"ERROR","loggerName":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager","message":"Error
> received: Status(apiVersion=v1, code=500, details=null, kind=Status,
> message=a watch stream was requested by the client but the required storage
> feature RequestWatchProgress is disabled, metadata=ListMeta(_continue=null,
> remainingItemCount=null, resourceVersion=null, selfLink=null,
> additionalProperties={}), reason=InternalError, status=Failure,
> additionalProperties={}), will
> retry","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","contextMap":{},"threadId":82,"threadPriority":5}{code}
> After some investigation, it appears the Kubernetes client library used by
> the Flink operator may be outdated and incompatible with Kubernetes 1.32?
> Reference:
> https://github.com/apache/flink-kubernetes-operator/blob/release-1.12.0/pom.xml#L81
> Support for Kubernetes 1.32 was introduced in client version 7.1.0.
> Thanks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)