[ https://issues.apache.org/jira/browse/FLINK-7351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110777#comment-16110777 ]
Nico Kruber commented on FLINK-7351: ------------------------------------ did another run with the following snipped and the failure is reproducible (even locally): {code} private static Logger LOG = LoggerFactory.getLogger(JobClientActorRecoveryITCase.class); @Test public void testJobClientRecovery1000() throws Exception { for (int i = 0; i < 1000; ++i) { LOG.info("starting test run " + i); testJobClientRecovery(); } } {code} {code} 12:17:38,304 INFO org.apache.flink.runtime.blob.FileSystemBlobStore - Creating highly available BLOB storage directory at /tmp/junit9004724949110959230/recovery//default/blob 12:17:38,304 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Enforcing default ACL for ZK connections 12:17:38,304 INFO org.apache.flink.runtime.util.ZooKeeperUtils - Using '/flink/default' as Zookeeper namespace. 12:17:38,304 INFO org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting 12:17:38,348 INFO org.apache.flink.runtime.minicluster.FlinkMiniCluster - Disabled queryable state server 12:17:38,348 INFO org.apache.flink.runtime.minicluster.FlinkMiniCluster - Starting FlinkMiniCluster. 12:17:38,348 INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED 12:17:38,354 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 12:17:38,355 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-a2ef16c3-6223-45a6-913b-748781acdb2d 12:17:38,356 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:35687 - max concurrent requests: 50 - max backlog: 1000 12:17:38,356 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 12:17:38,357 INFO org.apache.flink.runtime.testingUtils.TestingMemoryArchivist - Started memory archivist akka://flink/user/archive_1 12:17:38,359 INFO org.apache.flink.runtime.blob.BlobServer - Created BLOB server storage directory /tmp/blobStore-bc758698-bd94-4802-95d5-da4c6d856883 12:17:38,359 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Starting JobManager at akka://flink/user/jobmanager_1. 12:17:38,359 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@5cb0f0ab. 12:17:38,359 INFO org.apache.flink.runtime.blob.BlobServer - Started BLOB server at 0.0.0.0:32853 - max concurrent requests: 50 - max backlog: 1000 12:17:38,359 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 12:17:38,360 INFO org.apache.flink.runtime.testingUtils.TestingMemoryArchivist - Started memory archivist akka://flink/user/archive_2 12:17:38,363 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 12:17:38,363 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Grant leadership to contender akka://flink/user/jobmanager_1 with session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,363 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Starting JobManager at akka://flink/user/jobmanager_2. 12:17:38,363 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Starting ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@527cd5eb. 12:17:38,363 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - JobManager akka://flink/user/jobmanager_1 was granted leadership with leader session ID Some(3f4d9edf-5fa7-48c4-85ae-15bed36d46e4). 12:17:38,363 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Confirm leader session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4 for leader akka://flink/user/jobmanager_1. 12:17:38,364 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 12:17:38,365 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_1 is the leader with session ID null. 12:17:38,363 INFO org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have a max timeout of 100000 ms 12:17:38,365 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Write leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,365 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file directory '/tmp': total 9 GB, usable 6 GB (66.67% usable) 12:17:38,415 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Attempting to recover job efa7affb9fafdf7b682886f80a3bdeff. 12:17:38,416 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Successfully wrote leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,416 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Delaying recovery of all jobs by 10000 milliseconds. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,418 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,419 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Recovered SubmittedJobGraph(efa7affb9fafdf7b682886f80a3bdeff, JobInfo(clients: Set((Actor[akka://flink/user/$a#235524161],EXECUTION_RESULT_AND_STATE_CHANGES)), start: 1501676245446)). 12:17:38,419 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting recovered job efa7affb9fafdf7b682886f80a3bdeff. 12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting job efa7affb9fafdf7b682886f80a3bdeff (Blocking Test Job) (Recovery). 12:17:38,419 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager_1 12:17:38,419 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Using restart strategy NoRestartStrategy for efa7affb9fafdf7b682886f80a3bdeff. 12:17:38,419 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_1 is the leader with session ID 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart 12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Running initialization on master for job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff). 12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Successfully ran initialization on master in 0 ms. 12:17:38,420 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager_1#-1382860260] - leader session 3f4d9edf-5fa7-48c4-85ae-15bed36d46e4 12:17:38,420 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,420 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Scheduling job efa7affb9fafdf7b682886f80a3bdeff (Blocking Test Job). 12:17:38,420 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state CREATED to RUNNING. 12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (424ef083d9189043385f9f2b855aeb21) switched from CREATED to SCHEDULED. 12:17:38,420 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (424ef083d9189043385f9f2b855aeb21) switched from SCHEDULED to FAILED. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state RUNNING to FAILING. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to restart or fail the job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) if no longer possible. 12:17:38,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) switched from state FAILING to FAILED. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,422 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Blocking Test Job (efa7affb9fafdf7b682886f80a3bdeff) because the restart strategy prevented it. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,446 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Removed job graph efa7affb9fafdf7b682886f80a3bdeff from ZooKeeper. 12:17:38,488 INFO org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 197 MB for network buffer pool (number of memory segments: 6307, bytes per segment: 32768). 12:17:38,488 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the network environment and its components. 12:17:38,488 INFO org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting managed memory to 621 MB, memory will be allocated lazily. 12:17:38,489 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager uses directory /tmp/flink-io-54a003b9-6f13-49c8-a585-238ec48019c5 for spill files. 12:17:38,489 INFO org.apache.flink.runtime.metrics.MetricRegistry - No metrics reporter configured, no metrics will be exposed/reported. 12:17:38,489 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-b7357f57-18a2-40ad-8448-b251ad3af109 12:17:38,490 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-d3abe96c-6ec6-416e-9418-b1127cf407de 12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Starting TaskManager actor at akka://flink/user/taskmanager_1#2009077452. 12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager data connection information: c9a2fe7403e005322f998f352bbe5be5 @ localhost (dataPort=-1) 12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager has 1 task slot(s). 12:17:38,490 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Memory usage stats: [HEAP: 207/247/1979 MB, NON HEAP: 43/44/-1 MB (used/committed/max)] 12:17:38,490 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 12:17:38,492 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,492 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_1, session ID=3f4d9edf-5fa7-48c4-85ae-15bed36d46e4. 12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_1 (attempt 1, timeout: 500 milliseconds) 12:17:38,493 INFO org.apache.flink.runtime.testutils.TestingResourceManager - TaskManager c9a2fe7403e005322f998f352bbe5be5 has started. 12:17:38,493 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at localhost (akka://flink/user/taskmanager_1) as cd09541e56c8913613dc9a58f61d304a. Current number of registered hosts is 1. Current number of alive task slots is 1. 12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Successful registration at JobManager (akka://flink/user/jobmanager_1), starting network stack and library cache. 12:17:38,493 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Determined BLOB server address to be localhost/127.0.0.1:35687. Starting BLOB cache. 12:17:38,493 INFO org.apache.flink.runtime.blob.BlobCache - Created BLOB cache storage directory /tmp/blobStore-9364d0ae-7fe4-45d9-93c9-d978fae7caa8 12:17:38,495 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Stopping ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@5cb0f0ab. 12:17:38,495 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - TaskManager akka://flink/user/taskmanager_1 disconnects from JobManager akka://flink/user/jobmanager_1: JobManager is no longer reachable 12:17:38,495 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Starting ZooKeeperLeaderRetrievalService. 12:17:38,495 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Disassociating from JobManager 12:17:38,496 INFO org.apache.flink.runtime.blob.BlobCache - Shutting down BlobCache 12:17:38,496 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received SubmitJobAndWait(JobGraph(jobId: 41b8348843eb617e608df4f200590f37)) but there is no connection to a JobManager yet. 12:17:38,496 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Received job Blocking Test Job (41b8348843eb617e608df4f200590f37). 12:17:38,497 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_1 (attempt 1, timeout: 500 milliseconds) 12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Grant leadership to contender akka://flink/user/jobmanager_2 with session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,498 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - JobManager akka://flink/user/jobmanager_2 was granted leadership with leader session ID Some(a1124fe4-7739-452a-8b74-ee2b3fb7dad0). 12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Confirm leader session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0 for leader akka://flink/user/jobmanager_2. 12:17:38,498 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Write leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,503 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Successfully wrote leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,503 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,503 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Delaying recovery of all jobs by 10000 milliseconds. 12:17:38,503 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,503 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Disconnect from JobManager null. 12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connect to JobManager Actor[akka://flink/user/jobmanager_2#2090247331]. 12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Connected to JobManager at Actor[akka://flink/user/jobmanager_2#2090247331] with leader session id a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,504 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Sending message to JobManager akka://flink/user/jobmanager_2 to submit job Blocking Test Job (41b8348843eb617e608df4f200590f37) and wait for progress 12:17:38,505 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Upload jar files to job manager akka://flink/user/jobmanager_2. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Submit job to the job manager akka://flink/user/jobmanager_2. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Submitting job 41b8348843eb617e608df4f200590f37 (Blocking Test Job). 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Using restart strategy NoRestartStrategy for 41b8348843eb617e608df4f200590f37. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,505 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,506 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job recovers via failover strategy: full graph restart 12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,506 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,506 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Running initialization on master for job Blocking Test Job (41b8348843eb617e608df4f200590f37). 12:17:38,506 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Associated JobManager Actor[akka://flink/user/jobmanager_1#-1382860260] lost leader status 12:17:38,506 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Successfully ran initialization on master in 0 ms. 12:17:38,506 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Trying to associate with JobManager leader akka://flink/user/jobmanager_2 12:17:38,507 DEBUG org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Leader node changed while akka://flink/user/jobmanager_2 is the leader with session ID a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,507 INFO org.apache.flink.runtime.testutils.TestingResourceManager - Resource Manager associating with leading JobManager Actor[akka://flink/user/jobmanager_2#2090247331] - leader session a1124fe4-7739-452a-8b74-ee2b3fb7dad0 12:17:38,507 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,508 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,509 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Added SubmittedJobGraph(41b8348843eb617e608df4f200590f37, JobInfo(clients: Set((Actor[akka://flink/user/$a#282433225],EXECUTION_RESULT_AND_STATE_CHANGES)), start: 1501676258505)) to ZooKeeper. 12:17:38,510 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Scheduling job 41b8348843eb617e608df4f200590f37 (Blocking Test Job). 12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state CREATED to RUNNING. 12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Job 41b8348843eb617e608df4f200590f37 was successfully submitted to the JobManager akka://flink/deadLetters. 12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (19c331a32d8716bc6cd6d5bf7d1f02dd) switched from CREATED to SCHEDULED. 12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status RUNNING. 12:17:38,510 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Blocking Vertex(1/1) switched to SCHEDULED 12:17:38,510 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Blocking Vertex (1/1) (19c331a32d8716bc6cd6d5bf7d1f02dd) switched from SCHEDULED to FAILED. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,589 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state RUNNING to FAILING. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,589 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Blocking Vertex(1/1) switched to FAILED org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,589 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Leader node has changed. 12:17:38,589 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status FAILING. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,589 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Try to restart or fail the job Blocking Test Job (41b8348843eb617e608df4f200590f37) if no longer possible. 12:17:38,590 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Blocking Test Job (41b8348843eb617e608df4f200590f37) switched from state FAILING to FAILED. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,590 DEBUG org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - New leader information: Leader=akka://flink/user/jobmanager_2, session ID=a1124fe4-7739-452a-8b74-ee2b3fb7dad0. 12:17:38,590 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Could not restart the job Blocking Test Job (41b8348843eb617e608df4f200590f37) because the restart strategy prevented it. org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - 08/02/2017 12:17:38 Job execution switched to status FAILED. 12:17:38,591 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Trying to register at JobManager akka://flink/user/jobmanager_2 (attempt 1, timeout: 500 milliseconds) 12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Terminate JobClientActor. 12:17:38,591 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Disconnect from JobManager Actor[akka://flink/user/jobmanager_2#2090247331]. 12:17:38,591 INFO org.apache.flink.runtime.client.JobClient - Job execution failed 12:17:38,591 INFO org.apache.flink.runtime.instance.InstanceManager - Registered TaskManager at localhost (akka://flink/user/taskmanager_1) as 7b417742cd33c7f2e146a52a7e5597b9. Current number of registered hosts is 1. Current number of alive task slots is 1. 12:17:38,592 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 12:17:38,593 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Stopping TaskManager akka://flink/user/taskmanager_1#2009077452. 12:17:38,593 INFO org.apache.flink.runtime.testingUtils.TestingJobManager - Stopping JobManager akka://flink/user/jobmanager_2. 12:17:38,593 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 12:17:38,593 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService - Stopping ZooKeeperLeaderRetrievalService. 12:17:38,593 INFO org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager removed spill file directory /tmp/flink-io-54a003b9-6f13-49c8-a585-238ec48019c5 12:17:38,593 INFO org.apache.flink.runtime.io.network.NetworkEnvironment - Shutting down the network environment and its components. 12:17:38,594 INFO org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore - Removed job graph 41b8348843eb617e608df4f200590f37 from ZooKeeper. 12:17:38,594 INFO org.apache.flink.runtime.testingUtils.TestingTaskManager - Task manager akka://flink/user/taskmanager_1 is completely shut down. 12:17:38,594 INFO org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService - Stopping ZooKeeperLeaderElectionService org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService@527cd5eb. 12:17:38,595 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:32853 12:17:38,596 ERROR org.apache.flink.runtime.client.JobClientActorRecoveryITCase - -------------------------------------------------------------------------------- Test testJobClientRecovery1000(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) failed with: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Resources available to scheduler: Number of instances=0, total number of slots=0, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) at org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) at org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) at org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) ... 8 more {code} > test instability in JobClientActorRecoveryITCase#testJobClientRecovery > ---------------------------------------------------------------------- > > Key: FLINK-7351 > URL: https://issues.apache.org/jira/browse/FLINK-7351 > Project: Flink > Issue Type: Bug > Components: Job-Submission, Tests > Affects Versions: 1.3.2 > Reporter: Nico Kruber > Priority: Minor > Labels: test-stability > > On a 16-core VM, the following test failed during {{mvn clean verify}} > {code} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 22.814 sec > <<< FAILURE! - in org.apache.flink.runtime.client.JobClientActorRecoveryITCase > testJobClientRecovery(org.apache.flink.runtime.client.JobClientActorRecoveryITCase) > Time elapsed: 21.299 sec <<< ERROR! > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply$mcV$sp(JobManager.scala:933) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$7.apply(JobManager.scala:876) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: > org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: > Not enough free slots available to run the job. You can decrease the operator > parallelism or increase the number of slots per TaskManager in the > configuration. Resources available to scheduler: Number of instances=0, total > number of slots=0, available slots=0 > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:334) > at > org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:139) > at > org.apache.flink.runtime.executiongraph.Execution.allocateSlotForExecution(Execution.java:368) > at > org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:309) > at > org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:596) > at > org.apache.flink.runtime.executiongraph.ExecutionJobVertex.scheduleAll(ExecutionJobVertex.java:450) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleLazy(ExecutionGraph.java:834) > at > org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:814) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply$mcV$sp(JobManager.scala:1425) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$submitJob$1.apply(JobManager.scala:1372) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) > at > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)