Hi,

where exactly did you read many times that incremental checkpoints cannot 
reference files from previous checkpoints, because we would have to correct 
that information. In fact, this is how incremental checkpoints work. Now for 
this case, I would consider it extremely unlikely that a checkpoint 1620 would 
still reference a checkpoint 1, in particular if the files for that checkpoint 
are already deleted, which should only happen if it is no longer referenced. 
Which version of Flink are you using and what is your distributed filesystem? 
Is there any way to reproduce the problem?

Best,
Stefan

> Am 21.11.2017 um 14:30 schrieb gerardg <ger...@talaia.io>:
> 
> Hello,
> 
> We have a task that fails to restart from a checkpoint with the following
> error:
> 
> java.lang.IllegalStateException: Could not initialize keyed state backend.
>       at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321)
>       at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException:
> /home/gluster/flink/checkpoints/fac589c7248186bda2ad7b711f174973/chk-1/a069f85e-4ceb-4fba-9308-fb238f31574f
> (No such file or directory)
>       at java.io.FileInputStream.open0(Native Method)
>       at java.io.FileInputStream.open(FileInputStream.java:195)
>       at java.io.FileInputStream.<init>(FileInputStream.java:138)
>       at
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:49)
>       at
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
>       at
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
>       at
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:70)
>       at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.readStateData(RocksDBKeyedStateBackend.java:1290)
>       at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.readAllStateData(RocksDBKeyedStateBackend.java:1477)
>       at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:1333)
>       at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:1512)
>       at
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:979)
>       at
> org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772)
>       at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311)
>       ... 6 common frames omitted
> 
> It seems that it tries to restore the job using checkpoint number 1 (which
> was automatically deleted by flink), when the latest checkpoint is the 1620.
> And I can actually see how it logged that it would try to restore from
> checkpoint 1620:
> 
> Found 1 checkpoints in ZooKeeper. 
> Trying to retrieve checkpoint 1620. 
> Restoring from latest valid checkpoint: Checkpoint 1620 @ 1511267100332 for
> fac589c7248186bda2ad7b711f174973.
> 
> I have incremental checkpointing enabled, but I read many times that
> checkpoints do not reference themselves so I'm not sure what could be
> happening.
> 
> Gerard
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to