[ https://issues.apache.org/jira/browse/FLINK-24621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chesnay Schepler updated FLINK-24621: ------------------------------------- Priority: Blocker (was: Major) > JobManager fails to recover 1.13.1 checkpoint due to > InflightDataRescalingDescriptor > ------------------------------------------------------------------------------------ > > Key: FLINK-24621 > URL: https://issues.apache.org/jira/browse/FLINK-24621 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.13.2, 1.13.3 > Reporter: Chesnay Schepler > Priority: Blocker > > A user reporter on the mailing list of a JM that is unable to read a 1.13.1 > checkpoint. > The big question is why the InflightDataRescalingDescriptor is creating > problems, because it should not actually be contained in a checkpoint. > {code} > Caused by: org.apache.flink.util.FlinkException: Could not retrieve > checkpoint 2844 from state handle under checkpointID-0000000000000002844. > This indicates that the retrieved state handle is broken. Try cleaning the > state handle store. > at > org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStore.java:309) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.recover(DefaultCompletedCheckpointStore.java:151) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1513) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreInitialCheckpointIfPresent(CheckpointCoordinator.java:1476) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:134) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at > org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112) > ~[flink-dist_2.12-1.13.3.jar:1.13.3] > at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source) > ~[?:?] > at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?] > at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown > Source) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?] > at java.lang.Thread.run(Unknown Source) ~[?:?] > Caused by: java.io.InvalidClassException: > org.apache.flink.runtime.checkpoint.InflightDataRescalingDescriptor$NoRescalingDescriptor; > local class incompatible: stream classdesc serialVersionUID = > -5544173933105855751, local class serialVersionUID = 1 > at java.io.ObjectStreamClass.initNonProxy(Unknown Source) ~[?:?] > at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) ~[?:?] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)