Hi,

I'm currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each job 
has 1 parallelism. There's plenty of space left in my cluster and on that node. 
It's not clear to me what's happening. Any pointers?

On the client side, when I try to execute, I see the following:
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Could not upload the jar files to the job manager.
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
        at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
        at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
        at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)
        at 
com.gs.ep.da.lake.refinerlib.flink.FlowData.execute(FlowData.java:143)
        at 
com.gs.ep.da.lake.refinerlib.flink.FlowData.flowPartialIngestionHalf(FlowData.java:107)
        at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:72)
        at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:39)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not 
upload the jar files to the job manager.
        at 
org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:150)
        at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Could not retrieve the JobManager's blob port.
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745)
        at 
org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565)
        at 
org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:148)
        ... 9 more
Caused by: java.io.IOException: PUT operation failed: Connection reset
        at 
org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512)
        at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374)
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771)
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740)
        ... 11 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at 
org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:499)
        ... 14 more


On the job manager logs I see this:

2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device




Regina Chan
Goldman Sachs - Enterprise Platforms, Data Architecture
30 Hudson Street, 37th floor | Jersey City, NY 07302 *  (212) 902-5697

Reply via email to