We are using Flink 1.3.1 in Standalone mode with a HA job manager setup. ~ Karthik
On Fri, Oct 6, 2017 at 8:22 PM, Karthik Deivasigamani <karthi...@gmail.com> wrote: > Hi, > I'm noticing a weird issue with our flink streaming job. We use async > io operator which makes a HTTP call and in certain cases when the async > task times out, it throws an exception and causing the job to restart. > > java.lang.Exception: An async function call terminated with an exception. > Failing the AsyncWaitOperator. > at > org.apache.flink.streaming.api.operators.async.Emitter.output(Emitter.java:136) > at > org.apache.flink.streaming.api.operators.async.Emitter.run(Emitter.java:83) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.util.concurrent.TimeoutException: Async function call has timed out. > at > org.apache.flink.runtime.concurrent.impl.FlinkFuture.get(FlinkFuture.java:110) > > > After the job restarts(we have a fixed restart strategy) we notice that > the checkpoints start failing continuously with this message : > Checkpoint was declined (tasks not ready) > > [image: Inline image 1] > > But we see the job is running, its processing data, the accumulators we > have are getting incremented etc but checkpointing fails with tasks not > ready message. > > Wanted to reach out to the community to see if anyone else has experienced > this issue before? > ~ > Karthik >