Hi,
I'm having some troubles running the Flink taskmanager in a Docker
container (OpenShift). The container's internal storage is filling up
because the deleted jar files in blob storage are probably still in use
and therefore resources are not free'ed.
We are using Apache Beam to start an Apache Flink process, so the Jars
are sent to Apache Flink everytime we fire a batch.
I enabled the debug logging, but I can't seem to find anything showing
these deletes. Maybe someone has an idea why resources are not free'ed?
I checked the blob store, and it indeed are the jars.
208875129 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/142 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_90964be94a2f4471844a00284e44fb32/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ffa3f85003b1f124cd1cccdb0f72a8e0\
(deleted)
208875130 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/143 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_b7c00268b488411a8f6e1af984bcdcc2/blob_p-5202910b36af8c12548df97a7e4a057b77786217-8bab07adb34d4ce8de20846ec72059ce\
(deleted)
208875131 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/144 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_46183ac02f1dcd3543f8e481f59948b5/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac6bc86d8932e7d631416d9bafab4ab4\
(deleted)
208875132 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/145 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_717bf3f4b3f80700c1cc44d6076c2aca/blob_p-5202910b36af8c12548df97a7e4a057b77786217-780dd2383dee11a2361ac20a5da7bbb8\
(deleted)
208875133 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/146 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_22e67caac65c9c4e537caa3b072b8cc3/blob_p-5202910b36af8c12548df97a7e4a057b77786217-e0b523663672c641b368e5d1440b0b70\
(deleted)
208875134 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/147 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_3afe5b02ccb95b3494a1acd8677c66f0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9a8cd48c09a4b518adf0309a0255b339\
(deleted)
208875135 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/148 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cb024c561531905e81c9768ec62a2fe0/blob_p-5202910b36af8c12548df97a7e4a057b77786217-0addc83aaf9a2f781528ad035fd79cc8\
(deleted)
208875136 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/149 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_d3dc0b0608d71ffa77575771f088e80e/blob_p-5202910b36af8c12548df97a7e4a057b77786217-c9015b012ec4b249f32872471a31a500\
(deleted)
208875137 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/150 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_1b4cdb127bb2c345e1b099e3e446bf58/blob_p-5202910b36af8c12548df97a7e4a057b77786217-ac4457b393b7ff0565c47c1e38786005\
(deleted)
208875138 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/151 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_8c23503c614a88e8c8f7a54a31e41886/blob_p-5202910b36af8c12548df97a7e4a057b77786217-d096b3ef150bf7e8e98224e0b8f17292\
(deleted)
208875139 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/152 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_e7c8132da483bd14e5abfe9390adeeb1/blob_p-5202910b36af8c12548df97a7e4a057b77786217-f370d8dcad0cb36581f9a5f1568e1487\
(deleted)
208875140 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/153 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_cbee9f15b0c6adba0f5ddb67b587b607/blob_p-5202910b36af8c12548df97a7e4a057b77786217-9ae77c3419d77adab8f44258ca4290c5\
(deleted)
208875141 0 lr-x------ 1 1000150000 root 64 Apr 18 12:58
/proc/1/fd/154 ->
/var/tmp/flink/blobStore-580cc38d-44e4-45a1-8922-e21c00d73dec/job_29c5a145ae231be4c0d53717625c3938/blob_p-5202910b36af8c12548df97a7e4a057b77786217-76bb4d83f962a887d41effb2646bd63d\
(deleted)
There are several places in the code where the returned boolean of the
file delete is not read, so we have no clue if the file was deleted
succesfully. Maybe it can be changed to something like
java.nio.file.Files.delete to get an IOException when something goes
wrong. Though this is not a solution, but it can make it more
transparent when things go wrong.
Thanks,
Jeroen