Hello, we are using YarnClusterClient for job submission. After successful/failed job execution it looks like blob file for that job is deleted, but there is still some handle from Flink process to that file. As a result the file is not removed from machine and we faced no space felt on device error. Restarting Flink cluster moved situation back to normal, but we are submitting quite huge number of jobs and often cluster restarts is not a solution.
Results of lsof are: During job execution: lsof /flinkDir | grep job_dbafb671b0d60ed8a8ec2651fe59303b java 11883 yarn mem REG 253,2 112384928 109973177 /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578 java 11883 yarn 1837r REG 253,2 112384928 109973177 /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578 After job execution: lsof /flinkDir | grep job_dbafb671b0d60ed8a8ec2651fe59303b java 11883 yarn DEL REG 253,2 109973177 /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578 java 11883 yarn 1837r REG 253,2 112384928 109973177 /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578 *(deleted)* So the blob file is marked as deleted but it's still present as there is still some handle from Flink container process. Can you please advice, how can we avoid that situation, or if is it cause by some bug in Flink? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/