Hi Jean-Marc, I think this is related to https://issues.apache.org/jira/browse/FLINK-34455 . Could you provide more information about your setup and how you upgrade your job? Have you re-compiled your job under 1.20?
Best, Zakelly On Fri, Aug 23, 2024 at 1:36 AM Jean-Marc Paulin <j...@uk.ibm.com> wrote: > Hi, we have an error when Flink 1.20 resume a job from a savepoint that > was written with Flink 1.19 (We are in an upgrade scenario here). And we > hit the error `Caused by: java.lang.ClassNotFoundException: > org.apache.flink.runtime.jobgraph.RestoreMode`, see below > > So I get that `org.apache.flink.runtime.jobgraph.RestoreMode#LEGACY` was > deprecated in Flink 1.19, but the whole class has been remove in Flink > 1.20. I don't think we ever serialized that class in our code, but sommehow > the sjob recovery needs it. I can only suppose that was part of the > serialization in the savepoint. Did I miss a step somwwhere od there is an > issue here? > > I see a similar issue in Flink-CDC here: > https://issues.apache.org/jira/browse/FLINK-36105, but I do not think we > use this (unless it is internally via other dependencies). > > Any thoughts? > > Kind Regards > > JM > > ``` > java.util.concurrent.CompletionException: > org.apache.flink.util.FlinkRuntimeException: Could not recover job with job > id 2c5240ef004943b94e752c06eade10bd. > at > java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) > at > java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649) > at > java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at java.base/java.lang.Thread.run(Thread.java:857) > Caused by: org.apache.flink.util.FlinkRuntimeException: Could not recover > job with job id 2c5240ef004943b94e752c06eade10bd. > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:183) > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:150) > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$recoverJobsIfRunning$2(SessionDispatcherLeaderProcess.java:139) > at > org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198) > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:139) > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherBasedOnRecoveredJobGraphsAndRecoveredDirtyJobResults$1(SessionDispatcherLeaderProcess.java:129) > at > java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646) > ... 4 more > Caused by: org.apache.flink.util.FlinkException: Could not retrieve > submitted JobGraph from state handle under > /2c5240ef004943b94e752c06eade10bd. This indicates that you are trying to > recover from state written by an older Flink version which is not > compatible. Try cleaning the state handle store. > at > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:170) > at > org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:174) > ... 10 more > Caused by: java.lang.ClassNotFoundException: > org.apache.flink.runtime.jobgraph.RestoreMode > at java.base/java.lang.Class.forNameImpl(Native Method) > at java.base/java.lang.Class.forName(Class.java:429) > at > org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78) > at > java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2153) > at > java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:2017) > at > java.base/java.io.ObjectInputStream.readEnum(ObjectInputStream.java:2295) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1846) > at > java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2732) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2581) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2376) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1852) > at > java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2732) > at > java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2581) > at > java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2376) > at > java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1852) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:591) > at > java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:501) > at > org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:533) > at > org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:59) > at > org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:168) > ... 11 more > ``` > > > Unless otherwise stated above: > > IBM United Kingdom Limited > Registered in England and Wales with number 741598 > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >