Hi All,
I`m running flink1.6 on yarn,after the program run for a day, the flink
program fails on yarn, and the error log is as follows:
It seems that it is due to a timeout error. But I have the following
questions:
1. In which step the flink components communicate failed? What are the
two components?
2. How to solve this problem?
Thanks a lot!!
java.lang.Exception: Cannot deploy task LeftOuterJoin(where: (=(id,
article_id)), join: (id, created_time, article_score, PU, article_id, CU, CN))
-> select: (id, created_time, article_score, PU, CU, CN) (2/2)
(d403002a7accc5133cf89a386ddc1dfb) - TaskManager
(container_1532509321420_463249_01_000002 @ sh-bs-3-i1-hadoop-17-225
(dataPort=10459)) not responding after a rpcTimeout of 10000 ms
at
org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$5(Execution.java:601)
~[flink-runtime_2.11-1.6.0.jar:1.6.0]
at
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
~[na:1.8.0_65]
at
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
~[na:1.8.0_65]
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
~[na:1.8.0_65]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_65]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
~[na:1.8.0_65]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
~[na:1.8.0_65]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_65]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
~[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://flink@sh-bs-3-i1-hadoop-17-225:24213/user/taskmanager_0#-1762816591]]
after [10000 ms]. Sender[null] sent message of type
"org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
~[akka-actor_2.11-2.4.20.jar:na]
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
~[akka-actor_2.11-2.4.20.jar:na]
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
~[scala-library-2.11.8.jar:na]
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
~[scala-library-2.11.8.jar:na]
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
~[scala-library-2.11.8.jar:na]
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
~[akka-actor_2.11-2.4.20.jar:na]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
~[akka-actor_2.11-2.4.20.jar:na]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284)
~[akka-actor_2.11-2.4.20.jar:na]
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236)
~[akka-actor_2.11-2.4.20.jar:na]
... 1 common frames omitted
Best,
Henry