my guess is that tmp directory got cleaned on your host and Flink couldn't restore memory state from it upon startup.
Take a look at https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#configuring-temporary-io-directories article, I think it is relevant On Thu, Nov 1, 2018 at 8:51 PM Dmitry Minaev <mina...@gmail.com> wrote: > Hi everyone, > > I'm having an issue when restarting a job in Flink. I'm doing a simple > stop with savepoint and then start from the savepoint. Savepoints are > stored in a separate folder, there is no configuration for "/tmp" folder in > my setup. There is only 1 task manager and parallelism is 1. > > I'm getting FileNotFoundException: > > 31 Oct 2018 23:40:35,837 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > filter-business-metrics -> Sink: data_feed (1/1) > (51ce53532932c33805291dc188d2f99e) switched from DEPLOYING to RUNNING. > 31 Oct 2018 23:40:35,837 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > agents-working-on-interactions (1/1) (72a916158d07f2353fb270848d95ba2f) > switched from DEPLOYING to RUNNING. > 31 Oct 2018 23:40:35,929 INFO > org.apache.flink.runtime.executiongraph.ExecutionGraph - > interaction-details (1/1) (c004e64e90c0dbd3bc007459bc3d7420) switched from > RUNNING to FAILED. > java.io.FileNotFoundException: > /tmp/flink-io-7bfd6603-c115-463d-bcfc-b97e31be5a37/f7ce787242e6afd91c3cbeccc2f74bc4a7dd0e6e600ff83e51bc5be9a95750f9.0.buffer > (No such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) > at > org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259) > at > org.apache.flink.streaming.runtime.io.BufferSpiller.<init>(BufferSpiller.java:120) > at > org.apache.flink.streaming.runtime.io.BarrierBuffer.<init>(BarrierBuffer.java:149) > at > org.apache.flink.streaming.runtime.io.StreamInputProcessor.<init>(StreamInputProcessor.java:129) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) > at java.lang.Thread.run(Thread.java:748) > > I've checked the logs and there are no errors prior to that. The job was > stopped with no issues, and it was starting normally and passed multiple > operators setting them to RUNNING state. But for several other operators it > throws this FileNotFoundException. > > Any help is appreciated. > > -- Regards, Dmitry > -- > > -- > Dmitry >