Hi Partha
     The exception here said that there is some operator in the
checkpoint/savepoint, but not in the new program.
     As you said that, both times use the same user program binary,  this
seems a little strange to me. did you ever set the uid for all the
operators?

Best,
Congxian


Partha Mishra <partha.mis...@man-es.com> 于2020年10月23日周五 下午3:02写道:

> Hi,
>
>
>
> None of the operator is renamed or removed. Testing is carried out with
> exactly same binary used with 1.9 and 1.11.2. Checkpoint saved in 1.9 is
> not being able to retrieve in 1.11.2
>
>
>
>
>
> *From:* Sivaprasanna <sivaprasanna...@gmail.com>
> *Sent:* Friday, October 23, 2020 10:57 AM
> *To:* Partha Mishra <partha.mis...@man-es.com>
> *Cc:* user@flink.apache.org
> *Subject:* Re: Resuming Savepoint issue with upgraded Flink version 1.11.2
>
>
>
> Hi,
>
>
>
> Have you dropped or renamed any operator from the original job? If yes,
> and you are okay with discarding the state of that operator, you can submit
> the job with --allowNonRestoredState or -n.
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#allowing-non-restored-state
>
>
>
> -
>
> Sivaprasanna
>
>
>
> On Fri, Oct 23, 2020 at 10:48 AM Partha Mishra <partha.mis...@man-es.com>
> wrote:
>
> Hi,
>
>
>
> We are trying to save checkpoints for one of the flink job running in
> Flink version 1.9 and tried to resume the same flink job in Flink version
> 1.11.2. We are getting the below error when trying to restore the saved
> checkpoint in the newer flink version. Can
>
>
>
> Cannot map checkpoint/savepoint state for operator
> fbb4ef531e002f8fb3a2052db255adf5 to the new program, because the operator
> is not available in the new program.
>
>
>
>
>
> *Complete Stack Trace :*
>
> {"errors":["org.apache.flink.runtime.rest.handler.RestHandlerException:
> Could not execute application.\n\tat
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$1(JarRunHandler.java:103)\n\tat
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1609)\n\tat
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat
> java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)\n\tat
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)\n\tat
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat
> java.lang.Thread.run(Thread.java:748)\nCaused by:
> java.util.concurrent.CompletionException:
> org.apache.flink.util.FlinkRuntimeException: Could not execute
> application.\n\tat
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)\n\tat
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)\n\t...
> 7 more\nCaused by: org.apache.flink.util.FlinkRuntimeException: Could not
> execute application.\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:81)\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:67)\n\tat
> org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:100)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)\n\t...
> 7 more\nCaused by:
> org.apache.flink.client.program.ProgramInvocationException: The main method
> caused an error: Failed to execute job
> 'ST1_100Services-preprod-Tumbling-ProcessedBased'.\n\tat
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302)\n\tat
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)\n\tat
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:149)\n\tat
> org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:78)\n\t...
> 10 more\nCaused by: org.apache.flink.util.FlinkException: Failed to execute
> job 'ST1_100Services-preprod-Tumbling-ProcessedBased'.\n\tat
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1821)\n\tat
> org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:128)\n\tat
> org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)\n\tat
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1697)\n\tat
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:699)\n\tat
> com.man.ceon.cep.jobs.AnalyticService$.main(AnalyticService.scala:108)\n\tat
> com.man.ceon.cep.jobs.AnalyticService.main(AnalyticService.scala)\n\tat
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
> java.lang.reflect.Method.invoke(Method.java:498)\n\tat
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288)\n\t...
> 13 more\nCaused by: org.apache.flink.runtime.client.JobSubmissionException:
> Failed to submit job.\n\tat
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$internalSubmitJob$3(Dispatcher.java:344)\n\tat
> java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836)\n\tat
> java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811)\n\tat
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)\n\tat
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)\n\tat
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:44)\n\tat
> akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat
> akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\nCaused
> by: org.apache.flink.runtime.client.JobExecutionException: Could not
> instantiate JobManager.\n\tat
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:398)\n\tat
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)\n\t...
> 6 more\nCaused by: java.lang.IllegalStateException: Failed to rollback to
> checkpoint/savepoint
> s3://prodv2-flink-cluster/savepoints/savepoint-b76d18-d302cc7ca666. Cannot
> map checkpoint/savepoint state for operator
> fbb4ef531e002f8fb3a2052db255adf5 to the new program, because the operator
> is not available in the new program. If you want to allow to skip this,
> you can set the --allowNonRestoredState option on the CLI.\n\tat
> org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:210)\n\tat
> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:180)\n\tat
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1397)\n\tat
> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:300)\n\tat
> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:253)\n\tat
> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:229)\n\tat
> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:119)\n\tat
> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:103)\n\tat
> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:284)\n\tat
> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:272)\n\tat
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:98)\n\tat
> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:40)\n\tat
> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:140)\n\tat
> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:84)\n\tat
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$6(Dispatcher.java:388)\n\t...
> 7 more\n"]}
>
>
>
>

Reply via email to