[ https://issues.apache.org/jira/browse/FLINK-32212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875838#comment-17875838 ]
Alexis Leclerc commented on FLINK-32212: ---------------------------------------- We've encountered this issue as well during an automated node rotation impacting only the JobManager of a deployment. As recommended above, it was resolved by restarting the JobManager manually. Although these rotations impacting only the JobManager of a deployment have already happened a certain amount of time in the past, this was the first time noticing the K8S operator was part of services impacted by the rotation as well. I have a hard time seeing the correlation but I thought it good to note. This is on K8S Operator 1.8.0 and deployment running Flink 1.18.1 for reference. > Job restarting indefinitely after an IllegalStateException from > BlobLibraryCacheManager > --------------------------------------------------------------------------------------- > > Key: FLINK-32212 > URL: https://issues.apache.org/jira/browse/FLINK-32212 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.16.1 > Environment: Apache Flink Kubernetes Operator 1.4 > Reporter: Matheus Felisberto > Priority: Major > > After running for a few hours the job starts to throw IllegalStateException > and I can't figure out why. To restore the job, I need to manually delete the > FlinkDeployment to be recreated and redeploy everything. > The jar is built-in into the docker image, hence is defined accordingly with > the Operator's documentation: > {code:java} > // jarURI: local:///opt/flink/usrlib/my-job.jar {code} > I've tried to move it into /opt/flink/lib/my-job.jar but it didn't work > either. > > {code:java} > // Source: my-topic (1/2)#30587 > (b82d2c7f9696449a2d9f4dc298c0a008_bc764cd8ddf7a0cff126f51c16239658_0_30587) > switched from DEPLOYING to FAILED with failure cause: > java.lang.IllegalStateException: The library registration references a > different set of library BLOBs than previous registrations for this job: > old:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-7237ecbb12b0b021934b0c81aef78396] > new:[p-5d91888083d38a3ff0b6c350f05a3013632137c6-943737c6790a3ec6870cecd652b956c2] > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.verifyClassLoader(BlobLibraryCacheManager.java:419) > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$ResolvedClassLoader.access$500(BlobLibraryCacheManager.java:359) > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.getOrResolveClassLoader(BlobLibraryCacheManager.java:235) > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$LibraryCacheEntry.access$1100(BlobLibraryCacheManager.java:202) > at > org.apache.flink.runtime.execution.librarycache.BlobLibraryCacheManager$DefaultClassLoaderLease.getOrResolveClassLoader(BlobLibraryCacheManager.java:336) > at > org.apache.flink.runtime.taskmanager.Task.createUserCodeClassloader(Task.java:1024) > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:612) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550) > at java.base/java.lang.Thread.run(Unknown Source) {code} > If there is any other information that can help to identify the problem, > please let me know. > -- This message was sent by Atlassian Jira (v8.20.10#820010)