Thank you for your response.

On Tue, Nov 9, 2021 at 6:55 AM Dawid Wysakowicz <dwysakow...@apache.org>
wrote:

> Hey Sweta,
>
> Sorry I did not get back to you earlier.
>
> Could you explain how do you do the upgrade? Do you try to upgrade your
> cluster through HA services (e.g. zookeeper)? Meaning you bring down the
> 1.13.1 cluster down and start a 1.13.2/3 cluster which you intend to pick
> up the job automatically along with the latest checkpoint? Am I guessing
> correct? As far as I can tell we do not support such a way of upgrading.
>
> The way we support upgrades is via a savepoint/checkpoint. I'd suggest to
> either take a savepoint on 1.13.1 and restore[1] the job on 1.13.2 cluster
> or use an externalized checkpoint created from 1.13.1.
>
> Best,
>
> Dawid
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/cli/#starting-a-job-from-a-savepoint
> On 22/10/2021 16:39, Chesnay Schepler wrote:
>
> The only suggestion I can offer is to take a savepoint with 1.13.1 and try
> to restore from that.
>
> We will investigate the problem in
> https://issues.apache.org/jira/browse/FLINK-24621; currently we don't
> know why you are experiencing this issue.
>
> On 22/10/2021 16:02, Sweta Kalakuntla wrote:
>
> Hi,
>
> We are seeing error while upgrading minor versions from 1.13.1 to 1.13.2.
> JobManager is unable to recover the checkpoint state. What would be the
> solution to this issue?
>
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> checkpoint 2844 from state handle under checkpointID-0000000000000002844.
> This indicates that the retrieved state handle is broken. Try cleaning the
> state handle store.
> at
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.retrieveCompletedCheckpoint(DefaultCompletedCheckpointStore.java:309)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore.recover(DefaultCompletedCheckpointStore.java:151)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1513)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreInitialCheckpointIfPresent(CheckpointCoordinator.java:1476)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:134)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:342)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:190)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:122)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:132)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:110)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:340)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:317)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:107)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:95)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
> ~[flink-dist_2.12-1.13.3.jar:1.13.3]
> at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)
> ~[?:?]
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> ~[?:?]
> at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
> Source) ~[?:?]
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> ~[?:?]
> at java.lang.Thread.run(Unknown Source) ~[?:?]
> Caused by: java.io.InvalidClassException:
> org.apache.flink.runtime.checkpoint.InflightDataRescalingDescriptor$NoRescalingDescriptor;
> local class incompatible: stream classdesc serialVersionUID =
> -5544173933105855751, local class serialVersionUID = 1
> at java.io.ObjectStreamClass.initNonProxy(Unknown Source) ~[?:?]
> at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source) ~[?:?]
>
>
>
> Thank you,
>
> Sweta K
>
>
>
>

Reply via email to