Hello, I'm not sure whether the problem is connected with bad configuration or it's some inconsistency in the documentation but according to this document:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture . *I*f a job fails, all non-HA files' refCounts are reset to 0; all HA *files' refCounts remain and will not be increased again on recovery. *But in the JobManager's code if the Job Status is changed to failed and the JobManager receive the message with that fact, it will send *RemoveJob* message to itself, which invokes *removeJob() *function that always invokes following functions : libraryCacheManager.unregisterJob(jobID) blobServer.cleanupJob(jobID, removeJobFromStateBackend) jobManagerMetricGroup.removeJob(jobID) As far as I understand this removes blob entries immediately. And according to the doc it should only freeze refCounts for HA files and reset refCounts for non-Ha files to allow their later removal. Is the doc right and I have missed something here ? Thanks in Advance.