Hi Navneeth,

sorry for the late reply. To me it looks as
if 
/mnt/checkpoints/150dee2a70cecdd41b63a06b42a95649/chk-52/76363f89-d19f-44aa-aaf9-b33d89ec7c6c
has not been mounted to the EC2 machine you are using to run the job. Could
you try to log in onto the machine when the problem occurs and
check whether you can open the checkpointing path? Maybe the EFS
troubleshooting guide might also be of help [1].

[1] https://docs.aws.amazon.com/efs/latest/ug/troubleshooting.html

Cheers,
Till

On Wed, Oct 21, 2020 at 7:46 PM Navneeth Krishnan <reachnavnee...@gmail.com>
wrote:

> Hi All,
>
> Any feedback on how this can be resolved? This is causing downtime in
> production.
>
> Thanks
>
>
>
> On Tue, Oct 20, 2020 at 4:39 PM Navneeth Krishnan <
> reachnavnee...@gmail.com> wrote:
>
>> Hi All,
>>
>> I'm facing an issue in our flink application. This happens in version
>> 1.4.0 and 1.7.2. We have both versions and we are seeing this problem on
>> both. We are running flink on ECS and checkpointing enabled to EFS. When
>> the pipeline restarts due to some node failure or any other reason, it just
>> keeps restarting until the retry attempts without this same error message.
>> When I checked the EFS volume I do see the file is still available but for
>> some reason flink is unable to recover the job. Any pointers will help.
>> Thanks
>>
>> java.lang.Exception: Exception while creating StreamOperatorStateContext.
>>      at 
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:195)
>>      at 
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
>>      at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
>>      at 
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
>>      at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704)
>>      at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.flink.util.FlinkException: Could not restore operator 
>> state backend for StreamSource_cbc357ccb763df2852fee8c4fc7d55f2_(14/18) from 
>> any of the 1 provided restore options.
>>      at 
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
>>      at 
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.operatorStateBackend(StreamTaskStateInitializerImpl.java:245)
>>      at 
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:143)
>>      ... 5 more
>> Caused by: java.io.FileNotFoundException: 
>> /mnt/checkpoints/150dee2a70cecdd41b63a06b42a95649/chk-52/76363f89-d19f-44aa-aaf9-b33d89ec7c6c
>>  (No such file or directory)
>>      at java.io.FileInputStream.open0(Native Method)
>>      at java.io.FileInputStream.open(FileInputStream.java:195)
>>      at java.io.FileInputStream.<init>(FileInputStream.java:138)
>>      at 
>> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50)
>>      at 
>> org.apache.flink.core.fs.local.LocalFileSystem.open(LocalFileSystem.java:142)
>>      at 
>> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.open(SafetyNetWrapperFileSystem.java:85)
>>      at 
>> org.apache.flink.runtime.state.filesystem.FileStateHandle.openInputStream(FileStateHandle.java:68)
>>      at 
>> org.apache.flink.runtime.state.OperatorStreamStateHandle.openInputStream(OperatorStreamStateHandle.java:66)
>>      at 
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:286)
>>      at 
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:62)
>>      at 
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
>>      at 
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
>>      ... 7 more
>>
>>
>> *EFS:*
>>
>>
>> [image: image.png]
>>
>>
>>
>> Thanks
>>
>>

Reply via email to