Hi, I'm currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each job has 1 parallelism. There's plenty of space left in my cluster and on that node. It's not clear to me what's happening. Any pointers?
On the client side, when I try to execute, I see the following: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Could not upload the jar files to the job manager. at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926) at com.gs.ep.da.lake.refinerlib.flink.FlowData.execute(FlowData.java:143) at com.gs.ep.da.lake.refinerlib.flink.FlowData.flowPartialIngestionHalf(FlowData.java:107) at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:72) at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:39) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not upload the jar files to the job manager. at org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:150) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.io.IOException: Could not retrieve the JobManager's blob port. at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745) at org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565) at org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:148) ... 9 more Caused by: java.io.IOException: PUT operation failed: Connection reset at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512) at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374) at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771) at org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740) ... 11 more Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118) at java.net.SocketOutputStream.write(SocketOutputStream.java:159) at org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:499) ... 14 more On the job manager logs I see this: 2017-12-12 01:42:47,608 ERROR org.apache.flink.runtime.blob.BlobServerConnection - PUT operation failed java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) 2017-12-12 01:42:47,608 ERROR org.apache.flink.runtime.blob.BlobServerConnection - PUT operation failed java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) 2017-12-12 01:42:47,608 ERROR org.apache.flink.runtime.blob.BlobServerConnection - PUT operation failed java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314) at org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113) 2017-12-12 01:42:47,608 ERROR org.apache.flink.runtime.blob.BlobServerConnection - PUT operation failed java.io.IOException: No space left on device Regina Chan Goldman Sachs - Enterprise Platforms, Data Architecture 30 Hudson Street, 37th floor | Jersey City, NY 07302 * (212) 902-5697