Hi,

I think it would be very helpful if you could identify what data is behind. For 
example, I could imagine that it can be a jar file that was used by the TM and 
some classes are still in use or loaded by a classloader that was not yet GCed. 
Depending on that, there could be a problem in the user-code, in Flinkā€™s 
classloading, or with the blob storage. I would suggest to open a Jira issue 
and to supply as much information about the dangling file as possible (e.g. 
maybe concluding from the log what blobkey was mapped to what file, from the 
size, or by peeking at the content.

Best,
Stefan

> Am 19.09.2018 um 16:04 schrieb Piotr Szczepanek <piotr.szczepa...@gmail.com>:
> 
> Hello,
> 
> we are using YarnClusterClient for job submission. After successful/failed
> job execution it looks like blob file for that job is deleted, but there is
> still some handle from Flink process to that file. As a result the file is
> not removed from machine and we faced no space felt on device error.
> Restarting Flink cluster moved situation back to normal, but we are
> submitting quite huge number of jobs and often cluster restarts is not a
> solution.
> 
> Results of lsof are:
> During job execution:
> lsof /flinkDir | grep job_dbafb671b0d60ed8a8ec2651fe59303b
> java    11883  yarn  mem    REG  253,2 112384928 109973177
> /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578
> java    11883  yarn 1837r   REG  253,2 112384928 109973177
> /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578
> 
> After job execution:
> lsof /flinkDir | grep job_dbafb671b0d60ed8a8ec2651fe59303b
> java    11883  yarn  DEL    REG  253,2           109973177
> /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578
> java    11883  yarn 1837r   REG  253,2 112384928 109973177
> /flinkDir/yarn/../application_1536668870638_5555/blobStore-a1bcdbd4-5388-4c56-8052-6051f5af38dd/job_dbafb671b0d60ed8a8ec2651fe59303b/blob_p-8771d9ccac35e28d8571ac8957feaaecdebaeadd-7748aec7fe7369ca26181d0f94b1a578
> *(deleted)*
> 
> So the blob file is marked as deleted but it's still present as there is
> still some handle from Flink container process. 
> Can you please advice, how can we avoid that situation, or if is it cause by
> some bug in Flink?
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to