After submit job, I received 'Failed to execute job' error. And the time
between initialization and scheduling last 214s. What has happened during
this period?
version: flink: 1.12.2
deployment: k8s standalone
logs:
2021-04-14 12:47:58,547 WARN
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer [] - Property
[transaction.timeout.ms] not specified. Setting it to 3600000 ms
2021-04-14 12:48:04,175 INFO
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] -
Job 1276000e99efdb77bdae0df88ab91da3 is submitted.
2021-04-14 12:48:04,175 INFO
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] -
Submitting Job with JobId=1276000e99efdb77bdae0df88ab91da3.
2021-04-14 12:48:04,249 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Received
JobGraph submission 1276000e99efdb77bdae0df88ab91da3 (Prediction Program).
2021-04-14 12:48:04,249 INFO
org.apache.flink.runtime.dispatcher.StandaloneDispatcher [] - Submitting
job 1276000e99efdb77bdae0df88ab91da3 (Prediction Program).
2021-04-14 12:48:04,250 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService
[] - Starting RPC endpoint for
org.apache.flink.runtime.jobmaster.JobMaster at
akka://flink/user/rpc/jobmanager_8 .
2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Initializing job Prediction Program
(1276000e99efdb77bdae0df88ab91da3).
2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Using restart back off time strategy
NoRestartBackoffTimeStrategy for Prediction Program
(1276000e99efdb77bdae0df88ab91da3).
2021-04-14 12:48:04,251 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Running initialization on master for job Prediction Program
(1276000e99efdb77bdae0df88ab91da3).
2021-04-14 12:48:04,252 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Successfully ran initialization on master in 0 ms.
2021-04-14 12:48:04,254 INFO
org.apache.flink.runtime.scheduler.adapter.DefaultExecutionTopology [] - Built
10 pipelined regions in 0 ms
2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Using application-defined state backend:
org.apache.flink.streaming.api.operators.sorted.state.BatchExecutionStateBackend@3ea8cd5a
2021-04-14 12:48:04,255 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No checkpoint
found during restore.
2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Using failover strategy
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@26845997
for Prediction Program (1276000e99efdb77bdae0df88ab91da3).
2021-04-14 12:48:04,255 INFO
org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl [] - JobManager
runner for job Prediction Program (1276000e99efdb77bdae0df88ab91da3) was
granted leadership with session id 00000000-0000-0000-0000-000000000000 at
akka.tcp://flink@flink-jobmanager:6123/user/rpc/jobmanager_8.
2021-04-14 12:48:04,255 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Starting execution of job Prediction Program
(1276000e99efdb77bdae0df88ab91da3) under job master id
00000000000000000000000000000000.
2021-04-14 12:48:04,255 INFO
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Starting
split enumerator for source Source: TableSourceScan(table=[[default_catalog,
default_database, cpu_util, filter=[], project=[instance_id, value,
timestamp]]], fields=[instance_id, value, timestamp]) ->
Calc(select=[instance_id, value, timestamp], where=[(timestamp >
1618145278)]) -> SinkConversionToDataPoint -> Map.
org.apache.flink.util.FlinkException: Failed to execute job 'Prediction
Program'. at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1918)
at
org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135)
at
org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76)
at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782)
at com.jd.app.StreamingJob.main(StreamingJob.java:265) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
at
org.apache.flink.client.deployment.application.DetachedApplicationRunner.tryExecuteJobs(DetachedApplicationRunner.java:84)
at
org.apache.flink.client.deployment.application.DetachedApplicationRunner.run(DetachedApplicationRunner.java:70)
at
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$handleRequest$0(JarRunHandler.java:102)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Caused by:
java.lang.RuntimeException: Error while waiting for job to be initialized at
org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:160)
at
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$submitAndGetJobClientFuture$2(EmbeddedExecutor.java:140)
at
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:73)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616) at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at
java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
... 1 more Caused by: java.util.concurrent.ExecutionException:
java.util.concurrent.TimeoutException: Invocation of public default
java.util.concurrent.CompletableFuture
org.apache.flink.runtime.webmonitor.RestfulGateway.requestJobStatus(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
timed out. at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$null$0(EmbeddedExecutor.java:145)
at
org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:144)
... 6 more Caused by: java.util.concurrent.TimeoutException:
Invocation of public default java.util.concurrent.CompletableFuture
org.apache.flink.runtime.webmonitor.RestfulGateway.requestJobStatus(org.apache.flink.api.common.JobID,org.apache.flink.api.common.time.Time)
timed out. at com.sun.proxy.$Proxy26.requestJobStatus(Unknown Source)
at
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor.lambda$null$0(EmbeddedExecutor.java:143)
... 7 more Caused by: akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka://flink/user/rpc/dispatcher_1#1243668943]] after [60000 ms].
Message of type [org.apache.flink.runtime.rpc.messages.LocalFencedMessage]. A
typical reason for `AskTimeoutException` is that the recipient actor didn't
send a reply. at
akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) at
akka.pattern.PromiseActorRef$$anonfun$2.apply(AskSupport.scala:635) at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:648) at
akka.actor.Scheduler$$anon$4.run(Scheduler.scala:205) at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)
at
akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)
... 1 more
2021-04-14 12:49:04,321 ERROR com.jd.app.StreamingJob
[] - xxxx exec error org.apache.flink.util.FlinkException: Failed to
execute job 'xxxxxx'.
2021-04-14 12:51:38,327 INFO org.apache.flink.runtime.jobmaster.JobMaster
[] - Starting scheduling with scheduling strategy
[org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
2021-04-14 12:51:38,328 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Job
Prediction Program (1276000e99efdb77bdae0df88ab91da3) switched from state
CREATED to RUNNING.
2021-04-14 12:51:38,328 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source:
TableSourceScan(table=[[default_catalog, default_database, cpu_util, filter=[],
project=[instance_id, value, timestamp]]], fields=[instance_id, value,
timestamp]) -> Calc(select=[instance_id, value, timestamp],
where=[(timestamp > 1618145278)]) -> SinkConversionToDataPoint -> Map
(1/5) (52ad5c769b4836498fadf954d5921401) switched from CREATED to SCHEDULED.
2021-04-14 12:51:38,328 INFO
org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl [] - Cannot serve
slot request, no ResourceManager connected. Adding as pending request
[SlotRequestId{90a7db543b771ed399f0b2b0414ef288}]
2021-04-14 12:51:38,328 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph [] - Source:
TableSourceScan(table=[[default_catalog, default_database, cpu_util, filter=[],
project=[instance_id, value, timestamp]]], fields=[instance_id, value,
timestamp]) -> Calc(select=[instance_id, value, timestamp],
where=[(timestamp > 1618145278)]) -> SinkConversionToDataPoint -> Map
(2/5) (1f877463154f27d6f0aa7a9af9c2f64b) switched from CREATED to SCHEDULED.