Hi, did you check whether there are any files at your specified HDFS output location? If yes, which files are there?
Cheers, Aljoscha > On 03 Mar 2016, at 14:29, Maximilian Bode <[email protected]> wrote: > > Just for the sake of completeness: this also happens when killing a task > manager and is therefore probably unrelated to job manager HA. > >> Am 03.03.2016 um 14:17 schrieb Maximilian Bode <[email protected]>: >> >> Hi everyone, >> >> unfortunately, I am running into another problem trying to establish exactly >> once guarantees (Kafka -> Flink 1.0.0-rc3 -> HDFS). >> >> When using >> >> RollingSink<Tuple3<Integer,Integer,String>> sink = new >> RollingSink<Tuple3<Integer,Integer,String>>("hdfs://our.machine.com:8020/hdfs/dir/outbound"); >> sink.setBucketer(new NonRollingBucketer()); >> output.addSink(sink); >> >> and then killing the job manager, the new job manager is unable to restore >> the old state throwing >> --- >> java.lang.Exception: Could not restore checkpointed state to operators and >> functions >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:454) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >> at java.lang.Thread.run(Thread.java:744) >> Caused by: java.lang.Exception: Failed to restore state to function: >> In-Progress file hdfs://our.machine.com:8020/hdfs/dir/outbound/part-5-0 was >> neither moved to pending nor is still in progress. >> at >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:446) >> ... 3 more >> Caused by: java.lang.RuntimeException: In-Progress file >> hdfs://our.machine.com:8020/hdfs/dir/outbound/part-5-0 was neither moved to >> pending nor is still in progress. >> at >> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:686) >> at >> org.apache.flink.streaming.connectors.fs.RollingSink.restoreState(RollingSink.java:122) >> at >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:165) >> ... 4 more >> --- >> I found a resolved issue [1] concerning Hadoop 2.7.1. We are in fact using >> 2.4.0 – might this be the same issue? >> >> Another thing I could think of is that the job is not configured correctly >> and there is some sort of timing issue. The checkpoint interval is 10 >> seconds, everything else was left at default value. Then again, as the >> NonRollingBucketer is used, there should not be any timing issues, right? >> >> Cheers, >> Max >> >> [1] https://issues.apache.org/jira/browse/FLINK-2979 >> >> — >> Maximilian Bode * Junior Consultant * [email protected] >> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring >> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke >> Sitz: Unterföhring * Amtsgericht München * HRB 135082 >> >
