[ 
https://issues.apache.org/jira/browse/FLINK-34576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823613#comment-17823613
 ] 

Gyula Fora edited comment on FLINK-34576 at 3/5/24 1:19 PM:
------------------------------------------------------------

I am a bit busy at the moment so it will take some time until I get to this. In 
the meantime, I have 1-2 questions:

1.  Is there a way to somehow repro this on a smaller case?
2. Have you tried operator version 1.7.0? We may have fixed the issue there 
already
3. Does it also affect newer Flink versions as well?
4. Can you share some relevant operator logs?

Thanks


was (Author: gyfora):
I am a bit busy at the moment so it will take some time until I get to this. In 
the meantime, I have 1-2 questions:

1.  Is there a way to somehow repro this on a smaller case?
2. Have you tried operator version 1.7.0? We may have fixed the issue there 
already
3. Does it also affect newer Flink versions as well?

Thanks

> Flink deployment keep staying at RECONCILING/STABLE status
> ----------------------------------------------------------
>
>                 Key: FLINK-34576
>                 URL: https://issues.apache.org/jira/browse/FLINK-34576
>             Project: Flink
>          Issue Type: Bug
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.6.1
>            Reporter: chenyuzhi
>            Priority: Major
>         Attachments: image-2024-03-05-15-13-11-032.png
>
>
> The HA mode of flink-kubernetes-operator is being used. When one of the pods 
> of flink-kubernetes-operator restarts, flink-kubernetes-operator switches the 
> leader. However, some flinkdeployments have been in the 
> *JOB_STATUS=RECONCILING&LIFECYCLE_STATE=STABLE* state for a long time.
> Through the cmd "kubectl describe flinkdeployment xxx", can see the following 
> error, but there are no exceptions in the flink-kubernetes-operator log.
>  
> {code:java}
> Status:
>   Cluster Info:
>     Flink - Revision:             b6d20ed @ 2023-12-20T10:01:39+01:00
>     Flink - Version:              1.14.0-GDC1.6.0
>     Total - Cpu:                  7.0
>     Total - Memory:               30064771072
>   Error:                          
> {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException:
>  java.lang.RuntimeException: Failed to load 
> configuration","additionalMetadata":{},"throwableList":[{"type":"org.apache.flink.shaded.guava30.com.google.common.util.concurrent.UncheckedExecutionException","message":"java.lang.RuntimeException:
>  Failed to load 
> configuration","additionalMetadata":{}},{"type":"java.lang.RuntimeException","message":"Failed
>  to load configuration","additionalMetadata":{}}]}
>   Job Manager Deployment Status:  READY
>   Job Status:
>     Job Id:    cf44b5e73a1f263dd7d9f2c82be5216d
>     Job Name:  noah_stream_studio_1754211682_2218100380
>     Savepoint Info:
>       Last Periodic Savepoint Timestamp:  0
>       Savepoint History:
>     Start Time:     1705635107137
>     State:          RECONCILING
>     Update Time:    1709272530741
>   Lifecycle State:  STABLE {code}
>  
> !image-2024-03-05-15-13-11-032.png!
>  
> version:
> flink-kubernetes-operator: 1.6.1
> flink: 1.14.0/1.15.2 (flinkdeployment 1200+)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to