Hello,

We have a task that fails to restart from a checkpoint with the following
error:

java.lang.IllegalStateException: Could not initialize keyed state backend.
        at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:321)
        at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:217)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:676)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:663)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:252)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException:
/home/gluster/flink/checkpoints/fac589c7248186bda2ad7b711f174973/chk-1/a069f85e-4ceb-4fba-9308-fb238f31574f
(No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at
org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:49)
        at
org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
        at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
        at
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:70)
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.readStateData(RocksDBKeyedStateBackend.java:1290)
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.readAllStateData(RocksDBKeyedStateBackend.java:1477)
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:1333)
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:1512)
        at
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:979)
        at
org.apache.flink.streaming.runtime.tasks.StreamTask.createKeyedStateBackend(StreamTask.java:772)
        at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initKeyedState(AbstractStreamOperator.java:311)
        ... 6 common frames omitted

It seems that it tries to restore the job using checkpoint number 1 (which
was automatically deleted by flink), when the latest checkpoint is the 1620.
And I can actually see how it logged that it would try to restore from
checkpoint 1620:

Found 1 checkpoints in ZooKeeper. 
Trying to retrieve checkpoint 1620. 
Restoring from latest valid checkpoint: Checkpoint 1620 @ 1511267100332 for
fac589c7248186bda2ad7b711f174973.

I have incremental checkpointing enabled, but I read many times that
checkpoints do not reference themselves so I'm not sure what could be
happening.

Gerard



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to