[ 
https://issues.apache.org/jira/browse/FLINK-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15495900#comment-15495900
 ] 

ASF GitHub Bot commented on FLINK-4485:
---------------------------------------

Github user mxm commented on the issue:

    https://github.com/apache/flink/pull/2499
  
    Thanks! Just a few words to @nielsbasjes who reported the issue. I've 
tested the fix using the test instructions you provided. Even before this fix, 
I could get rid of the temp files by forcing a manual garbage collection on the 
JVM, using `jcmd <pid> GC.run`. However, that only worked once the job meta 
data had been removed from the archive, i.e. it doesn't show up in the web 
interface anymore. With this fix, the class loader is cleared upon job 
completion and the files are immediately removed. `lsof | fgrep blob_` didn't 
show any of these files anymore.
    
    Note, that we don't perform any cleanup on the TaskManager side. There we 
also wind up with some left over files but they don't seem to pile up. It must 
be that the garbage collector can figure out when to clean much earlier. Plus, 
we don't keep a reference to old Task instances like we do for the web 
interface on the JobManager side.
    
    @StephanEwen I'm thinking about adding a similar fix for the TaskManager 
side. What do you think?


> Finished jobs in yarn session fill /tmp filesystem
> --------------------------------------------------
>
>                 Key: FLINK-4485
>                 URL: https://issues.apache.org/jira/browse/FLINK-4485
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.1.0
>            Reporter: Niels Basjes
>            Assignee: Maximilian Michels
>            Priority: Blocker
>
> On a Yarn cluster I start a yarn-session with a few containers and task slots.
> Then I fire a 'large' number of Flink batch jobs in sequence against this 
> yarn session. It is the exact same job (java code) yet it gets different 
> parameters.
> In this scenario it is exporting HBase tables to files in HDFS and the 
> parameters are about which data from which tables and the name of the target 
> directory.
> After running several dozen jobs the jobs submission started to fail and we 
> investigated.
> We found that the cause was that on the Yarn node which was hosting the 
> jobmanager the /tmp file system was full (4GB was 100% full).
> How ever the output of {{du -hcs /tmp}} showed only 200MB in use.
> We found that a very large file (we guess it is the jar of the job) was put 
> in /tmp , used, deleted yet the file handle was not closed by the jobmanager.
> As soon as we killed the jobmanager the disk space was freed.
> The summary of the impact of this is that a yarn-session that receives enough 
> jobs brings down the Yarn node for all users.
> See parts of the output we got from {{lsof}} below.
> {code}
> COMMAND     PID      USER   FD      TYPE             DEVICE      SIZE       
> NODE NAME
> java      15034   nbasjes  550r      REG             253,17  66219695        
> 245 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000003 
> (deleted)
> java      15034   nbasjes  551r      REG             253,17  66219695        
> 252 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000007 
> (deleted)
> java      15034   nbasjes  552r      REG             253,17  66219695        
> 267 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000012 
> (deleted)
> java      15034   nbasjes  553r      REG             253,17  66219695        
> 250 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000005 
> (deleted)
> java      15034   nbasjes  554r      REG             253,17  66219695        
> 288 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000018 
> (deleted)
> java      15034   nbasjes  555r      REG             253,17  66219695        
> 298 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000025 
> (deleted)
> java      15034   nbasjes  557r      REG             253,17  66219695        
> 254 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000008 
> (deleted)
> java      15034   nbasjes  558r      REG             253,17  66219695        
> 292 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000019 
> (deleted)
> java      15034   nbasjes  559r      REG             253,17  66219695        
> 275 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000013 
> (deleted)
> java      15034   nbasjes  560r      REG             253,17  66219695        
> 159 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000002 
> (deleted)
> java      15034   nbasjes  562r      REG             253,17  66219695        
> 238 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000001 
> (deleted)
> java      15034   nbasjes  568r      REG             253,17  66219695        
> 246 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000004 
> (deleted)
> java      15034   nbasjes  569r      REG             253,17  66219695        
> 255 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000009 
> (deleted)
> java      15034   nbasjes  571r      REG             253,17  66219695        
> 299 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000026 
> (deleted)
> java      15034   nbasjes  572r      REG             253,17  66219695        
> 293 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000020 
> (deleted)
> java      15034   nbasjes  574r      REG             253,17  66219695        
> 256 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000010 
> (deleted)
> java      15034   nbasjes  575r      REG             253,17  66219695        
> 302 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000029 
> (deleted)
> java      15034   nbasjes  576r      REG             253,17  66219695        
> 294 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000021 
> (deleted)
> java      15034   nbasjes  577r      REG             253,17  66219695        
> 262 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000011 
> (deleted)
> java      15034   nbasjes  578r      REG             253,17  66219695        
> 251 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000006 
> (deleted)
> java      15034   nbasjes  580r      REG             253,17  66219695        
> 295 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000022 
> (deleted)
> java      15034   nbasjes  581r      REG             253,17  66219695        
> 300 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000027 
> (deleted)
> java      15034   nbasjes  582r      REG             253,17  66219695        
> 188 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/cache/blob_e318d1698aa6e7dc91e5f4a9f8ba29781aebd8c4
>  (deleted)
> java      15034   nbasjes  585r      REG             253,17  66219695        
> 279 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000014 
> (deleted)
> java      15034   nbasjes  586r      REG             253,17  66219695        
> 296 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000023 
> (deleted)
> java      15034   nbasjes  588r      REG             253,17  66219695        
> 301 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000028 
> (deleted)
> java      15034   nbasjes  589r      REG             253,17  66219695        
> 297 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000024 
> (deleted)
> java      15034   nbasjes  598r      REG             253,17  66219695        
> 280 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000015 
> (deleted)
> java      15034   nbasjes  601r      REG             253,17  66219695        
> 289 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000016 
> (deleted)
> java      15034   nbasjes  604r      REG             253,17  66219695        
> 284 
> /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000017 
> (deleted)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to