Thanks!

On Tue, Dec 7, 2021, 22:55 Robert Metzger <metrob...@gmail.com> wrote:

> 811d3b279c8b26ed99ff0883b7630242 is the operator id.
> If I'm not mistaken, running the job graph generation (e.g. the main
> method) in DEBUG log level will show you all the IDs generated. This should
> help you map this ID to your code.
>
> On Wed, Dec 8, 2021 at 7:52 AM Dan Hill <quietgol...@gmail.com> wrote:
>
>> Nothing changed (as far as I know).  It's the same binary and the same
>> args.  It's Flink v1.12.3.  I'm going to switch away from auto-gen uids and
>> see if that helps.  The job randomly started failing to checkpoint.  I
>> cancelled the job and started it from the last successful checkpoint.
>>
>> I'm confused why `811d3b279c8b26ed99ff0883b7630242` is used and not the
>> auto-generated uid.  That seems like a bug.
>>
>> On Tue, Dec 7, 2021 at 10:40 PM Robert Metzger <metrob...@gmail.com>
>> wrote:
>>
>>> Hi Dan,
>>>
>>> When restoring a savepoint/checkpoint, Flink is matching the state for
>>> the operators based on the uuid of the operator. The exception says that
>>> there is some state that doesn't match any operator. So from Flink's
>>> perspective, the operator is gone.
>>> Here is more information:
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#assigning-operator-ids
>>>
>>>
>>> Somehow something must have changed in your job: Did you change the
>>> Flink version?
>>>
>>> Hope this helps!
>>>
>>> On Wed, Dec 8, 2021 at 5:49 AM Dan Hill <quietgol...@gmail.com> wrote:
>>>
>>>> I'm restoring the job with the same binary and same flags/args.
>>>>
>>>> On Tue, Dec 7, 2021 at 8:48 PM Dan Hill <quietgol...@gmail.com> wrote:
>>>>
>>>>> Hi.  I noticed this warning has "operator
>>>>> 811d3b279c8b26ed99ff0883b7630242" in it.  I assume this should be an
>>>>> operator uid or name.  It looks like something else.  What is it?  Is
>>>>> something corrupted?
>>>>>
>>>>>
>>>>> org.apache.flink.runtime.client.JobInitializationException: Could not 
>>>>> instantiate JobManager.
>>>>>   at 
>>>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:494)
>>>>>   at 
>>>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>   at 
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>   at java.lang.Thread.run(Thread.java:748)
>>>>> Caused by: java.lang.IllegalStateException: Failed to rollback to 
>>>>> checkpoint/savepoint 
>>>>> s3a://my-flink-state/checkpoints/ce9b90eafde97ca4629c13936c34426f/chk-626.
>>>>>  Cannot map checkpoint/savepoint state for operator 
>>>>> 811d3b279c8b26ed99ff0883b7630242 to the new program, because the operator 
>>>>> is not available in the new program. If you want to allow to skip this, 
>>>>> you can set the --allowNonRestoredState option on the CLI.
>>>>>   at 
>>>>> org.apache.flink.runtime.checkpoint.Checkpoints.throwNonRestoredStateException(Checkpoints.java:226)
>>>>>   at 
>>>>> org.apache.flink.runtime.checkpoint.Checkpoints.loadAndValidateCheckpoint(Checkpoints.java:190)
>>>>>   at 
>>>>> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1627)
>>>>>   at 
>>>>> org.apache.flink.runtime.scheduler.SchedulerBase.tryRestoreExecutionGraphFromSavepoint(SchedulerBase.java:362)
>>>>>   at 
>>>>> org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:292)
>>>>>   at 
>>>>> org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:249)
>>>>>   at 
>>>>> org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:133)
>>>>>   at 
>>>>> org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:111)
>>>>>   at 
>>>>> org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:345)
>>>>>   at 
>>>>> org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:330)
>>>>>   at 
>>>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:95)
>>>>>   at 
>>>>> org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.createJobMasterService(DefaultJobMasterServiceFactory.java:39)
>>>>>   at 
>>>>> org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl.<init>(JobManagerRunnerImpl.java:162)
>>>>>   at 
>>>>> org.apache.flink.runtime.dispatcher.DefaultJobManagerRunnerFactory.createJobManagerRunner(DefaultJobManagerRunnerFactory.java:86)
>>>>>   at 
>>>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:478)
>>>>>   ... 4 more
>>>>>
>>>>>

Reply via email to