[ https://issues.apache.org/jira/browse/FLINK-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16681680#comment-16681680 ]
Andrey Zagrebin commented on FLINK-10482: ----------------------------------------- [~JBiason] do you still have the full log? can you post it here? do you see there something like this: _Incremented the completed number of checkpoints without incrementing the in progress checkpoints before._ ? > java.lang.IllegalArgumentException: Negative number of in progress checkpoints > ------------------------------------------------------------------------------ > > Key: FLINK-10482 > URL: https://issues.apache.org/jira/browse/FLINK-10482 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing > Affects Versions: 1.6.1 > Reporter: Julio Biason > Priority: Major > Fix For: 1.8.0 > > > Recently I found the following log on my JobManager log: > {noformat} > 2018-10-02 17:44:50,090 [flink-akka.actor.default-dispatcher-4117] ERROR > org.apache.flink.runtime.rest.handler.job.JobDetailsHandler - Implementation > error: Unhandled exception. > java.lang.IllegalArgumentException: Negative number of in progress > checkpoints > at > org.apache.flink.util.Preconditions.checkArgument(Preconditions.java:139) > at > org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.<init>(CheckpointStatsCounts.java:72) > at > org.apache.flink.runtime.checkpoint.CheckpointStatsCounts.createSnapshot(CheckpointStatsCounts.java:177) > at > org.apache.flink.runtime.checkpoint.CheckpointStatsTracker.createSnapshot(CheckpointStatsTracker.java:166) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.getCheckpointStatsSnapshot(ExecutionGraph.java:553) > at > org.apache.flink.runtime.executiongraph.ArchivedExecutionGraph.createFrom(ArchivedExecutionGraph.java:340) > at > org.apache.flink.runtime.jobmaster.JobMaster.requestJob(JobMaster.java:923) > at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247) > > > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162) > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) > at > org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) > > > at > org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) > > > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) > > > at akka.actor.Actor$class.aroundReceive(Actor.scala:502) > > > at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) > > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) > > at akka.actor.ActorCell.invoke(ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) > > > at akka.dispatch.Mailbox.run(Mailbox.scala:224) > at akka.dispatch.Mailbox.exec(Mailbox.scala:234) > > > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > > > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {noformat} > Related: The job details don't appear, the screen shows only the skeleton, > but no information (like the pipeline, substasks, etc). > One thing that may have caused this is that the job was failing – an uncaught > exception on our code – and, during one of its restarts, I issued a "flink > cancel <jobid>". The job was cancelled, but the JobManager interface took a > very long time to put the slots as available again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)