You will have to take a look at the JobManager/TaskManager logs.
On 04.09.2018 12:02, Paul Lam wrote:
Hi,
I’m using Flink 1.5.3 and failed to trigger savepoint for a Flink on
YARN job. The stack traces shows that an exception occurred while
triggering the checkpoint, but the normal checkpoints of the job are
running well.
What could possibly be the problem? Thanks a lot!
The stack traces are as follow:
org.apache.flink.util.FlinkException: Triggering a savepoint for the
job 1ca7d429484c64eb64fa646672389a74 failed.
at
org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:695)
at
org.apache.flink.client.cli.CliFrontend.lambda$savepoint$7(CliFrontend.java:673)
at
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:670)
at
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1040)
at
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: java.util.concurrent.CompletionException:
java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed
to trigger savepoint. Decline reason: An Exception occurred while
triggering the checkpoint.
at
org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$13(JobMaster.java:955)
at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
at
java.util.concurrent.CompletableFuture.uniExceptionallyStage(CompletableFuture.java:884)
at
java.util.concurrent.CompletableFuture.exceptionally(CompletableFuture.java:2196)
at
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:951)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70)
at
org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142)
at
org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40)
at
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException:
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed
to trigger savepoint. Decline reason: An Exception occurred while
triggering the checkpoint.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:614)
at
java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1983)
at
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:943)
... 21 more
Caused by:
org.apache.flink.runtime.checkpoint.CheckpointTriggerException: Failed
to trigger savepoint. Decline reason: An Exception occurred while
triggering the checkpoint.
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.triggerSavepoint(CheckpointCoordinator.java:377)
at
org.apache.flink.runtime.jobmaster.JobMaster.triggerSavepoint(JobMaster.java:942)
... 21 more
Best,
Paul Lam