Niels Basjes created FLINK-4485: ----------------------------------- Summary: Finished jobs in yarn session fill /tmp filesystem Key: FLINK-4485 URL: https://issues.apache.org/jira/browse/FLINK-4485 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 1.1.0 Reporter: Niels Basjes Priority: Blocker
On a Yarn cluster I start a yarn-session with a few containers and task slots. Then I fire a 'large' number of Flink batch jobs in sequence against this yarn session. It is the exact same job (java code) yet it gets different parameters. In this scenario it is exporting HBase tables to files in HDFS and the parameters are about which data from which tables and the name of the target directory. After running several dozen jobs the jobs submission started to fail and we investigated. We found that the cause was that on the Yarn node which was hosting the jobmanager the /tmp file system was full (4GB was 100% full). How ever the output of {{du -hcs /tmp}} showed only 200MB in use. We found that a very large file (we guess it is the jar of the job) was put in /tmp , used, deleted yet the file handle was not closed by the jobmanager. As soon as we killed the jobmanager the disk space was freed. See parts of the output we got from {{lsof}} below. {code} COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME java 15034 nbasjes 550r REG 253,17 66219695 245 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000003 (deleted) java 15034 nbasjes 551r REG 253,17 66219695 252 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000007 (deleted) java 15034 nbasjes 552r REG 253,17 66219695 267 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000012 (deleted) java 15034 nbasjes 553r REG 253,17 66219695 250 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000005 (deleted) java 15034 nbasjes 554r REG 253,17 66219695 288 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000018 (deleted) java 15034 nbasjes 555r REG 253,17 66219695 298 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000025 (deleted) java 15034 nbasjes 557r REG 253,17 66219695 254 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000008 (deleted) java 15034 nbasjes 558r REG 253,17 66219695 292 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000019 (deleted) java 15034 nbasjes 559r REG 253,17 66219695 275 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000013 (deleted) java 15034 nbasjes 560r REG 253,17 66219695 159 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000002 (deleted) java 15034 nbasjes 562r REG 253,17 66219695 238 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000001 (deleted) java 15034 nbasjes 568r REG 253,17 66219695 246 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000004 (deleted) java 15034 nbasjes 569r REG 253,17 66219695 255 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000009 (deleted) java 15034 nbasjes 571r REG 253,17 66219695 299 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000026 (deleted) java 15034 nbasjes 572r REG 253,17 66219695 293 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000020 (deleted) java 15034 nbasjes 574r REG 253,17 66219695 256 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000010 (deleted) java 15034 nbasjes 575r REG 253,17 66219695 302 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000029 (deleted) java 15034 nbasjes 576r REG 253,17 66219695 294 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000021 (deleted) java 15034 nbasjes 577r REG 253,17 66219695 262 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000011 (deleted) java 15034 nbasjes 578r REG 253,17 66219695 251 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000006 (deleted) java 15034 nbasjes 580r REG 253,17 66219695 295 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000022 (deleted) java 15034 nbasjes 581r REG 253,17 66219695 300 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000027 (deleted) java 15034 nbasjes 582r REG 253,17 66219695 188 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/cache/blob_e318d1698aa6e7dc91e5f4a9f8ba29781aebd8c4 (deleted) java 15034 nbasjes 585r REG 253,17 66219695 279 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000014 (deleted) java 15034 nbasjes 586r REG 253,17 66219695 296 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000023 (deleted) java 15034 nbasjes 588r REG 253,17 66219695 301 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000028 (deleted) java 15034 nbasjes 589r REG 253,17 66219695 297 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000024 (deleted) java 15034 nbasjes 598r REG 253,17 66219695 280 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000015 (deleted) java 15034 nbasjes 601r REG 253,17 66219695 289 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000016 (deleted) java 15034 nbasjes 604r REG 253,17 66219695 284 /tmp/blobStore-fbe9c4cf-1f85-48cb-aad9-180e8d4ec7ce/incoming/temp-00000017 (deleted) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)