[ 
https://issues.apache.org/jira/browse/FLINK-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antti Kaikkonen updated FLINK-20192:
------------------------------------
    Description: 
When I try to restore from an externalized checkpoint located at: 
+/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ I 
get the following error: 
  
{code:java}
java.lang.Exception: Exception while creating StreamOperatorStateContext.     
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
     at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
     at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528) 
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)     at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)     at 
java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.flink.util.FlinkException: Could not restore keyed state backend for 
FunctionGroupOperator_6b87a4870d0e21cecbbe271bd893cfcc_(2/4) from any of the 1 
provided restore options.     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
     ... 9 more Caused by: 
org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected 
exception.     at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:535)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
     ... 11 more Caused by: java.io.FileNotFoundException: 
/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8
 (No such file or directory)     at java.io.FileInputStream.open0(Native 
Method)     at java.io.FileInputStream.open(FileInputStream.java:195)     at 
java.io.FileInputStream.<init>(FileInputStream.java:138)     at 
org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
     at 
org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143)   
  at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
     at 
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)
     at 
org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
     at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
     at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
     at 
java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)
     at 
java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)    
 at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:66)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:230)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:195)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.initDBWithRescaling(RocksDBIncrementalRestoreOperation.java:342)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:276)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:153)
     at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
     ... 15 more{code}
The job +0fc94de8d94e123585b5baed6972dbe8+ was restored from an externalized 
checkpoint generated by +01dbaf21d7c5e8f8eabd3602e086bb89+ and after the 
restoration was successful and +0fc94de8d94e123585b5baed6972dbe8+ had generated 
new externalized checkpoints I thought it was safe to delete the checkpoints 
from +01dbaf21d7c5e8f8eabd3602e086bb89+ but apparently I was wrong.

I have attached the _metadata file from 
+/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ 
which contains the reference to 
+/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8+.

  was:
When I try to restore from an externalized checkpoint located at: 
+/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ I 
get the following error: 
  
{code:java}
java.lang.Exception: Exception while creating StreamOperatorStateContext.     
at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
     at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
     at 
org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
     at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528) 
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)     at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)     at 
java.lang.Thread.run(Thread.java:748) Caused by: 
org.apache.flink.util.FlinkException: Could not restore keyed state backend for 
FunctionGroupOperator_6b87a4870d0e21cecbbe271bd893cfcc_(2/4) from any of the 1 
provided restore options.     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
     ... 9 more Caused by: 
org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected 
exception.     at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:535)
     at 
org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
     at 
org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
     ... 11 more Caused by: java.io.FileNotFoundException: 
/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8
 (No such file or directory)     at java.io.FileInputStream.open0(Native 
Method)     at java.io.FileInputStream.open(FileInputStream.java:195)     at 
java.io.FileInputStream.<init>(FileInputStream.java:138)     at 
org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
     at 
org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143)   
  at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
     at 
org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)
     at 
org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
     at 
java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
     at 
org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
     at 
java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)
     at 
java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)    
 at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)
     at 
org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:66)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:230)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:195)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.initDBWithRescaling(RocksDBIncrementalRestoreOperation.java:342)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:276)
     at 
org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:153)
     at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
     ... 15 more{code}
The job 0fc94de8d94e123585b5baed6972dbe8 was restored from an externalized 
checkpoint generated by +01dbaf21d7c5e8f8eabd3602e086bb89+ and after the 
restoration was successful and +0fc94de8d94e123585b5baed6972dbe8+ had generated 
new externalized checkpoints I thought it was safe to delete the checkpoints 
from +01dbaf21d7c5e8f8eabd3602e086bb89+ but apparently I was wrong.

I have attached the _metadata file from 
+/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ 
which contains the reference to 
+/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8+.


> Externalized checkpoint references a checkpoint from a different job
> --------------------------------------------------------------------
>
>                 Key: FLINK-20192
>                 URL: https://issues.apache.org/jira/browse/FLINK-20192
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataStream, Runtime / Checkpointing
>    Affects Versions: 1.11.2
>            Reporter: Antti Kaikkonen
>            Priority: Major
>         Attachments: _metadata
>
>
> When I try to restore from an externalized checkpoint located at: 
> +/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ I 
> get the following error: 
>   
> {code:java}
> java.lang.Exception: Exception while creating StreamOperatorStateContext.     
> at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:204)
>      at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:247)
>      at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:290)
>      at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:479)
>      at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
>      at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:475)
>      at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:528)
>      at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)     at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)     at 
> java.lang.Thread.run(Thread.java:748) Caused by: 
> org.apache.flink.util.FlinkException: Could not restore keyed state backend 
> for FunctionGroupOperator_6b87a4870d0e21cecbbe271bd893cfcc_(2/4) from any of 
> the 1 provided restore options.     at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135)
>      at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:317)
>      at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:144)
>      ... 9 more Caused by: 
> org.apache.flink.runtime.state.BackendBuildingException: Caught unexpected 
> exception.     at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:329)
>      at 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:535)
>      at 
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:301)
>      at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:142)
>      at 
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:121)
>      ... 11 more Caused by: java.io.FileNotFoundException: 
> /home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8
>  (No such file or directory)     at java.io.FileInputStream.open0(Native 
> Method)     at java.io.FileInputStream.open(FileInputStream.java:195)     at 
> java.io.FileInputStream.<init>(FileInputStream.java:138)     at 
> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
>      at 
> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:143) 
>     at 
> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
>      at 
> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:69)
>      at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForStateHandle(RocksDBStateDownloader.java:126)
>      at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.lambda$createDownloadRunnables$0(RocksDBStateDownloader.java:109)
>      at 
> org.apache.flink.util.function.ThrowingRunnable.lambda$unchecked$0(ThrowingRunnable.java:50)
>      at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
>      at 
> org.apache.flink.runtime.concurrent.DirectExecutorService.execute(DirectExecutorService.java:211)
>      at 
> java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1654)
>      at 
> java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1871)  
>    at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.downloadDataForAllStateHandles(RocksDBStateDownloader.java:83)
>      at 
> org.apache.flink.contrib.streaming.state.RocksDBStateDownloader.transferAllStateDataToDirectory(RocksDBStateDownloader.java:66)
>      at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.transferRemoteStateToLocalDirectory(RocksDBIncrementalRestoreOperation.java:230)
>      at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreFromRemoteState(RocksDBIncrementalRestoreOperation.java:195)
>      at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.initDBWithRescaling(RocksDBIncrementalRestoreOperation.java:342)
>      at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restoreWithRescaling(RocksDBIncrementalRestoreOperation.java:276)
>      at 
> org.apache.flink.contrib.streaming.state.restore.RocksDBIncrementalRestoreOperation.restore(RocksDBIncrementalRestoreOperation.java:153)
>      at 
> org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:270)
>      ... 15 more{code}
> The job +0fc94de8d94e123585b5baed6972dbe8+ was restored from an externalized 
> checkpoint generated by +01dbaf21d7c5e8f8eabd3602e086bb89+ and after the 
> restoration was successful and +0fc94de8d94e123585b5baed6972dbe8+ had 
> generated new externalized checkpoints I thought it was safe to delete the 
> checkpoints from +01dbaf21d7c5e8f8eabd3602e086bb89+ but apparently I was 
> wrong.
> I have attached the _metadata file from 
> +/home/anttkaik/flink/checkpoints/0fc94de8d94e123585b5baed6972dbe8/chk-12+ 
> which contains the reference to 
> +/home/anttkaik/flink/checkpoints/01dbaf21d7c5e8f8eabd3602e086bb89/shared/0a3c0c1d-c924-4e6d-b6ad-463a75c9fce8+.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to