Hi! Thanks for reporting this.
This looks like a bug that we fixed in Flink 1.7.1 [1]. Would you be able to try with 1.7.1 and see if the issue is still happening for you? Cheers, Gordon [1] https://issues.apache.org/jira/browse/FLINK-11094 On Tue, Jan 29, 2019, 6:29 PM Averell <lvhu...@gmail.com wrote: > I tried to create a savepoint on HDFS, and got the same exception: > > ------------------------------------------------------------ > The program finished with the following exception: > > org.apache.flink.util.FlinkException: Triggering a savepoint for the job > 028e392d02bd229ed08f50a2da5227e2 failed. > at > > org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:723) > at > > org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:701) > at > > org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:985) > at > org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:698) > at > > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1065) > at > > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) > at > > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) > at > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: java.util.concurrent.CompletionException: > java.util.concurrent.CompletionException: java.lang.Exception: Checkpoint > failed: Could not perform checkpoint 35 for operator Merge sourceA&sourceB > (7/16). > at > > org.apache.flink.runtime.jobmaster.JobMaster.lambda$triggerSavepoint$14(JobMaster.java:970) > at > > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > at > > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > at > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > at > > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > at > > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abortWithCause(PendingCheckpoint.java:452) > at > > org.apache.flink.runtime.checkpoint.PendingCheckpoint.abortError(PendingCheckpoint.java:447) > at > > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.discardCheckpoint(CheckpointCoordinator.java:1258) > at > > org.apache.flink.runtime.checkpoint.CheckpointCoordinator.failUnacknowledgedPendingCheckpointsFor(CheckpointCoordinator.java:918) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.notifyExecutionChange(ExecutionGraph.java:1779) > at > > org.apache.flink.runtime.executiongraph.ExecutionVertex.notifyStateTransition(ExecutionVertex.java:756) > at > > org.apache.flink.runtime.executiongraph.Execution.transitionState(Execution.java:1353) > at > > org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1113) > at > > org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:945) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:1576) > at > > org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:542) > at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162) > at > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) > at > > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) > at > > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) > at > > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.util.concurrent.CompletionException: java.lang.Exception: > Checkpoint failed: Could not perform checkpoint 35 for operator Merge > sourceA&sourceB (7/16). > at > > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > at > > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > at > java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593) > at > > java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) > ... 33 more > Caused by: java.lang.Exception: Checkpoint failed: Could not perform > checkpoint 35 for operator Merge sourceA&sourceB (7/16). > ... 30 more > Caused by: java.lang.Exception: Could not perform checkpoint 35 for > operator > Merge sourceA&sourceB (7/16). > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:595) > at > > org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396) > at > > org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292) > at > > org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200) > at > org.apache.flink.streaming.runtime.io > .StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:273) > at > > org.apache.flink.streaming.runtime.tasks.TwoInputStreamTask.run(TwoInputStreamTask.java:117) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:704) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.Exception: Could not complete snapshot 35 for operator > Merge sourceA&sourceB (7/16). > at > > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:422) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641) > at > > org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586) > ... 8 more > Caused by: java.lang.NullPointerException > at > > org.apache.flink.contrib.streaming.state.snapshot.RocksFullSnapshotStrategy.doSnapshot(RocksFullSnapshotStrategy.java:130) > at > > org.apache.flink.contrib.streaming.state.snapshot.RocksDBSnapshotStrategyBase.snapshot(RocksDBSnapshotStrategyBase.java:128) > at > > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.snapshot(RocksDBKeyedStateBackend.java:496) > at > > org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:407) > ... 13 more > > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ >