Thanks Vino & Piotr, sure, will upgrade the flink version and monitor it to see if the problem still exist.
Thanks On Mon, Jan 6, 2020 at 12:39 AM Piotr Nowojski <pi...@ververica.com> wrote: > Hi, > > From the top of my head I don’t remember anything particular, however > release 1.4.0 came with quite a lot of deep change which had it’s fair > share number of bugs, that were subsequently fixed in later releases. > > Because 1.4.x tree is no longer supported I would strongly recommend to > first upgrade to a more recent Flink version. If that’s not possible, I > would at least upgrade to the latest release from 1.4.x tree (1.4.2). > > Piotrek > > On 6 Jan 2020, at 07:25, vino yang <yanghua1...@gmail.com> wrote: > > Hi Navneeth, > > Since the file still exists, this exception is very strange. > > I want to ask, does it happen by accident or frequently? > > Another concern is that since the 1.4 version is very far away, all > maintenance and response are not as timely as the recent versions. I > personally recommend upgrading as soon as possible. > > I can ping @Piotr Nowojski <pi...@ververica.com> and see if it is > possible to explain the cause of this problem. > > Best, > Vino > > Navneeth Krishnan <reachnavnee...@gmail.com> 于2020年1月4日周六 上午1:03写道: > >> Thanks Congxian & Vino. >> >> Yes, the file do exist and I don't see any problem in accessing it. >> >> Regarding flink 1.9, we haven't migrated yet but we are planning to do. >> Since we have to test it might take sometime. >> >> Thanks >> >> On Fri, Jan 3, 2020 at 2:14 AM Congxian Qiu <qcx978132...@gmail.com> >> wrote: >> >>> Hi >>> >>> Do you have ever check that this problem exists on Flink 1.9? >>> >>> Best, >>> Congxian >>> >>> >>> vino yang <yanghua1...@gmail.com> 于2020年1月3日周五 下午3:54写道: >>> >>>> Hi Navneeth, >>>> >>>> Did you check if the path contains in the exception is really can not >>>> be found? >>>> >>>> Best, >>>> Vino >>>> >>>> Navneeth Krishnan <reachnavnee...@gmail.com> 于2020年1月3日周五 上午8:23写道: >>>> >>>>> Hi All, >>>>> >>>>> We are running into checkpoint timeout issue more frequently in >>>>> production and we also see the below exception. We are running flink 1.4.0 >>>>> and the checkpoints are saved on NFS. Can someone suggest how to overcome >>>>> this? >>>>> >>>>> <image.png> >>>>> >>>>> java.lang.IllegalStateException: Could not initialize operator state >>>>> backend. >>>>> at >>>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:302) >>>>> at >>>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:249) >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692) >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679) >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) >>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718) >>>>> at java.lang.Thread.run(Thread.java:748) >>>>> Caused by: java.io.FileNotFoundException: >>>>> /mnt/checkpoints/02c4f8d5c11921f363b98c5959cc4f06/chk-101/e71d8eaf-ff4a-4783-92bd-77e3d8978e01 >>>>> (No such file or directory) >>>>> at java.io.FileInputStream.open0(Native Method) >>>>> at java.io.FileInputStream.open(FileInputStream.java:195) >>>>> at java.io.FileInputStream.<init>(FileInputStream.java:138) >>>>> at >>>>> org.apache.flink.core.fs.local.LocalDataInputStream.<init>(LocalDataInputStream.java:50) >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >