Hi Jean-Marc,

I think this is related to https://issues.apache.org/jira/browse/FLINK-34455
. Could you provide more information about your setup and how you upgrade
your job? Have you re-compiled your job under 1.20?


Best,
Zakelly


On Fri, Aug 23, 2024 at 1:36 AM Jean-Marc Paulin <j...@uk.ibm.com> wrote:

> Hi, we have an error when Flink 1.20 resume a job from a savepoint that
> was written with Flink 1.19 (We are in an upgrade scenario here). And we
> hit the error `Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.runtime.jobgraph.RestoreMode`, see below
>
> So I get that `org.apache.flink.runtime.jobgraph.RestoreMode#LEGACY` was
> deprecated in Flink 1.19, but the whole class has been remove in Flink
> 1.20. I don't think we ever serialized that class in our code, but sommehow
> the sjob recovery needs it. I can only suppose that was part of the
> serialization in the savepoint. Did I miss a step somwwhere od there is an
> issue here?
>
> I see a similar issue in Flink-CDC here:
> https://issues.apache.org/jira/browse/FLINK-36105, but I do not think we
> use this (unless it is internally via other dependencies).
>
> Any thoughts?
>
> Kind Regards
>
> JM
>
> ```
> java.util.concurrent.CompletionException:
> org.apache.flink.util.FlinkRuntimeException: Could not recover job with job
> id 2c5240ef004943b94e752c06eade10bd.
>   at
> java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315)
>   at
> java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320)
>   at
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:649)
>   at
> java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
>   at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>   at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>   at java.base/java.lang.Thread.run(Thread.java:857)
> Caused by: org.apache.flink.util.FlinkRuntimeException: Could not recover
> job with job id 2c5240ef004943b94e752c06eade10bd.
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:183)
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobs(SessionDispatcherLeaderProcess.java:150)
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$recoverJobsIfRunning$2(SessionDispatcherLeaderProcess.java:139)
>   at
> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198)
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.recoverJobsIfRunning(SessionDispatcherLeaderProcess.java:139)
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.lambda$createDispatcherBasedOnRecoveredJobGraphsAndRecoveredDirtyJobResults$1(SessionDispatcherLeaderProcess.java:129)
>   at
> java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
>   ... 4 more
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> submitted JobGraph from state handle under
> /2c5240ef004943b94e752c06eade10bd. This indicates that you are trying to
> recover from state written by an older Flink version which is not
> compatible. Try cleaning the state handle store.
>   at
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:170)
>   at
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.tryRecoverJob(SessionDispatcherLeaderProcess.java:174)
>   ... 10 more
> Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.runtime.jobgraph.RestoreMode
>   at java.base/java.lang.Class.forNameImpl(Native Method)
>   at java.base/java.lang.Class.forName(Class.java:429)
>   at
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:78)
>   at
> java.base/java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2153)
>   at
> java.base/java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:2017)
>   at
> java.base/java.io.ObjectInputStream.readEnum(ObjectInputStream.java:2295)
>   at
> java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1846)
>   at
> java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2732)
>   at
> java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2581)
>   at
> java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2376)
>   at
> java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1852)
>   at
> java.base/java.io.ObjectInputStream$FieldValues.<init>(ObjectInputStream.java:2732)
>   at
> java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2581)
>   at
> java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2376)
>   at
> java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1852)
>   at
> java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:591)
>   at
> java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:501)
>   at
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:533)
>   at
> org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:59)
>   at
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore.recoverJobGraph(DefaultJobGraphStore.java:168)
>   ... 11 more
>   ```
>
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Reply via email to