After upgrading the Flink Kubernetes Operator from v1.11 to v1.12 upgrades
started to fail in all my jobs with the following error message:
```
Error during event processing ExecutionScope{ resource id:
ResourceID{name='my-job-checkpoint-periodic-1741010907590',
namespace='platform'}, version: 2446801878}
```
The upgrade was failing in a very weird way:
- First a savepoint was taken and uploaded to S3
- After some time that savepoint was finally removed from S3 but not from
the cluster CR
- Making the upgrade fail because the savepoint could not be found
Can this be related to this change from here?
-
https://flink.apache.org/2025/06/03/apache-flink-kubernetes-operator-1.12.0-release-announcement/#bug-fixes-and-stability-enhancements
*Savepoint Information Update*: Fixed a bug where upgrade savepoints were
not added to the deprecated savepointInfo, ensuring accurate tracking of
savepoints during upgrades.
In case it helps, here you are the complete stack trace:
```json
{
"threadId": 352,
"loggerFqcn": "org.apache.logging.slf4j.Log4jLogger",
"level": "ERROR",
"thrown": {
"extendedStackTrace": [
{
"file": "Controller.java",
"method": "cleanup",
"line": 212,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class": "io.javaoperatorsdk.operator.processing.Controller",
"version": "1.12.0"
},
{
"file": "ReconciliationDispatcher.java",
"method": "handleCleanup",
"line": 291,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
"version": "1.12.0"
},
{
"file": "ReconciliationDispatcher.java",
"method": "handleDispatch",
"line": 89,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
"version": "1.12.0"
},
{
"file": "ReconciliationDispatcher.java",
"method": "handleExecution",
"line": 64,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher",
"version": "1.12.0"
},
{
"file": "EventProcessor.java",
"method": "run",
"line": 452,
"exact": true,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor",
"version": "1.12.0"
},
{
"method": "runWorker",
"line": -1,
"exact": true,
"location": "?",
"class": "java.util.concurrent.ThreadPoolExecutor",
"version": "?"
},
{
"method": "run",
"line": -1,
"exact": true,
"location": "?",
"class": "java.util.concurrent.ThreadPoolExecutor$Worker",
"version": "?"
},
{
"method": "run",
"line": -1,
"exact": true,
"location": "?",
"class": "java.lang.Thread",
"version": "?"
}
],
"localizedMessage": "java.lang.NullPointerException",
"name": "io.javaoperatorsdk.operator.OperatorException",
"cause": {
"extendedStackTrace": [
{
"file": "FlinkResourceContextFactory.java",
"method": "getFlinkStateSnapshotContext",
"line": 96,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"org.apache.flink.kubernetes.operator.service.FlinkResourceContextFactory",
"version": "1.12.0"
},
{
"file": "FlinkStateSnapshotController.java",
"method": "cleanup",
"line": 97,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"org.apache.flink.kubernetes.operator.controller.FlinkStateSnapshotController",
"version": "1.12.0"
},
{
"file": "FlinkStateSnapshotController.java",
"method": "cleanup",
"line": 55,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"org.apache.flink.kubernetes.operator.controller.FlinkStateSnapshotController",
"version": "1.12.0"
},
{
"file": "Controller.java",
"method": "execute",
"line": 199,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class": "io.javaoperatorsdk.operator.processing.Controller$2",
"version": "1.12.0"
},
{
"file": "Controller.java",
"method": "execute",
"line": 162,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class": "io.javaoperatorsdk.operator.processing.Controller$2",
"version": "1.12.0"
},
{
"file": "OperatorJosdkMetrics.java",
"method": "timeControllerExecution",
"line": 80,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class":
"org.apache.flink.kubernetes.operator.metrics.OperatorJosdkMetrics",
"version": "1.12.0"
},
{
"file": "Controller.java",
"method": "cleanup",
"line": 161,
"exact": false,
"location": "flink-kubernetes-operator-1.12.0-shaded.jar",
"class": "io.javaoperatorsdk.operator.processing.Controller",
"version": "1.12.0"
}
],
"name": "java.lang.NullPointerException",
"commonElementCount": 7
},
"commonElementCount": 0,
"message": "java.lang.NullPointerException"
},
"endOfBatch": false,
"thread": "ReconcilerExecutor-flinkstatesnapshotcontroller-352",
"loggerName":
"io.javaoperatorsdk.operator.processing.event.EventProcessor",
"threadPriority": 5,
"instant": {
"epochSecond": 1750744905,
"nanoOfSecond": 13000000
}
}
```
On 2025/03/04 08:29:20 Salva Alcántara wrote:
> Hey all! I recently bumped the Flink Kubernetes Operator to v1.10.0 and
one
> of the things I wanted to check is the usage of the new FlinkStateSnapshot
> CRD. I confirmed that the CRD was correctly created in my cluster, however
> I'm still seeing these logs:
>
> ```
> Starting Operator
> 2025-03-01T08:31:08.779422Z main ERROR appender CONSOLE has no parameter
> that matches element JsonLayout
> 2025-03-01T08:31:08.782927Z main ERROR Unable to locate appender
> "ConsoleAppender" for logger config "root"
> 2025-03-01 08:31:12,885 i.f.k.c.d.i.VersionUsageUtils [WARN ] The client
> is using resource type 'flinkstatesnapshots' with unstable version
'v1beta1'
> 2025-03-01 08:31:14,180 o.a.f.k.o.c.FlinkConfigManager [WARN ]
> FlinkStateSnapshot CRD was not installed, snapshot resources will be
> disabled!
> ```
>
> I think this relates to the RBAC stuff. For what it's worth, the
> FlinkStateSnapshot CRD was not installed log message goes away if I switch
> to a cluster-wide installaction (which handles RBAC via clusterrole &
> clusterrolebinding). However, for a namespaced installation like mine
> (using a non-empty array for watchNamespaces) there must be something
> wrong, despite RBAC apparently being right, i.e.:
>
> ```
> kubectl auth can-i list flinkstatesnapshot -n a-watched-namespace
> --as=system:serviceaccount:flink-operator:flink-operator
> yes
> ```
>
> The answer is the same for any namespace within watchNamespaces (w.r.t.
> flink-operator, which is where I deploy the operator).
>
> The issue might be in this line:
>
> -
>
https://github.com/apache/flink-kubernetes-operator/blob/9eb3c385b90a5a2f08376720f[
…]ache/flink/kubernetes/operator/utils/KubernetesClientUtils.java
> <
https://github.com/apache/flink-kubernetes-operator/blob/9eb3c385b90a5a2f08376720f3204d1784981a0c/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/KubernetesClientUtils.java#L72C31-L72C67
>
>
> which is not passing any special config, maybe the idea was to use
> getKubernetesClient instead? Can anyone help troubleshoot the issue?
>