And if it helps, I'm running on flink 1.2.1. I saw this ticket: 
https://issues.apache.org/jira/browse/FLINK-5828 It only started happening when 
I was running all 50 flows at the same time. However, it looks like it's not an 
issue with creating the cache directory but with running out of space there? 
But what's in there is also tiny.

bash-4.1$ hdfs dfs -du -h 
hdfs://d191291/user/delp/.flink/application_1510733430616_2098853
1.1 K    
hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/5c71e4b6-2567-4d34-98dc-73b29c502736-taskmanager-conf.yaml
1.4 K    
hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-conf.yaml
93.5 M   
hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/flink-dist_2.10-1.2.1.jar
264.8 M  hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/lib
1.9 K    
hdfs://d191291/user/delp/.flink/application_1510733430616_2098853/log4j.properties


From: Chan, Regina [Tech]
Sent: Tuesday, December 12, 2017 1:56 AM
To: 'user@flink.apache.org'
Subject: ProgramInvocationException: Could not upload the jar files to the job 
manager / No space left on device

Hi,

I'm currently submitting 50 separate jobs to a 50TM, 1 slot set up. Each job 
has 1 parallelism. There's plenty of space left in my cluster and on that node. 
It's not clear to me what's happening. Any pointers?

On the client side, when I try to execute, I see the following:
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Could not upload the jar files to the job manager.
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
        at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:101)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
        at 
org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:387)
        at 
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
        at 
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926)
        at 
com.gs.ep.da.lake.refinerlib.flink.FlowData.execute(FlowData.java:143)
        at 
com.gs.ep.da.lake.refinerlib.flink.FlowData.flowPartialIngestionHalf(FlowData.java:107)
        at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:72)
        at com.gs.ep.da.lake.refinerlib.flink.FlowData.call(FlowData.java:39)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Could not 
upload the jar files to the job manager.
        at 
org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:150)
        at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:95)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
        at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.IOException: Could not retrieve the JobManager's blob port.
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:745)
        at 
org.apache.flink.runtime.jobgraph.JobGraph.uploadUserJars(JobGraph.java:565)
        at 
org.apache.flink.runtime.client.JobSubmissionClientActor$1.call(JobSubmissionClientActor.java:148)
        ... 9 more
Caused by: java.io.IOException: PUT operation failed: Connection reset
        at 
org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:512)
        at org.apache.flink.runtime.blob.BlobClient.put(BlobClient.java:374)
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:771)
        at 
org.apache.flink.runtime.blob.BlobClient.uploadJarFiles(BlobClient.java:740)
        ... 11 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:118)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
        at 
org.apache.flink.runtime.blob.BlobClient.putInputStream(BlobClient.java:499)
        ... 14 more


On the job manager logs I see this:

2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:345)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.put(BlobServerConnection.java:314)
        at 
org.apache.flink.runtime.blob.BlobServerConnection.run(BlobServerConnection.java:113)
2017-12-12 01:42:47,608 ERROR 
org.apache.flink.runtime.blob.BlobServerConnection            - PUT operation 
failed
java.io.IOException: No space left on device




Regina Chan
Goldman Sachs - Enterprise Platforms, Data Architecture
30 Hudson Street, 37th floor | Jersey City, NY 07302 *  (212) 902-5697

Reply via email to