Thank you very much, Robert! The problem is that the job/task manager shutdown methods are never called. When using the scripts, the task/job manager processes get killed and therefore shutdown methods are never called.
@Till: Do you know whether there is a mechanism in Akka to register the actors for JVM shutdown hooks? I tried to register a shutdown hook via Runtime.getRuntime().addShutdownHook(), but I didn't manage to get a reference to the task manager. On Thu, Feb 5, 2015 at 3:29 PM, Till Rohrmann <trohrm...@apache.org> wrote: > Hi Robert, > > thanks for the info. If the TaskManager/JobManager does not shutdown > properly, i.e. killing of the process, then it is indeed the case that the > BlobManager cannot properly remove all stored files. I don't know if this > was lately the case for you. Furthermore, the files are not directly > deleted after the job has finished. Internally there is a cleanup task > which is triggered every our and deletes all blobs which are no longer > referenced. > > But we definitely have to look into it to see how we could improve this > behaviour. > > Greets, > > Till > > On Thu, Feb 5, 2015 at 3:21 PM, Robert Waury <robert.wa...@googlemail.com> > wrote: > >> I talked with the admins. The problem seemed to have been that the disk >> was full and Flink couldn't create the directory. >> >> Maybe the the error message should reflect if that is the cause. >> >> While cleaning up the disk we noticed that a lot of temporary blobStore >> files were not deleted by Flink after the job finished. This seemed to have >> caused or at least worsened the problem. >> >> Cheers, >> Robert >> >> On Thu, Feb 5, 2015 at 1:14 PM, Ufuk Celebi <u...@apache.org> wrote: >> >>> On Thu, Feb 5, 2015 at 11:23 AM, Robert Waury < >>> robert.wa...@googlemail.com> wrote: >>> >>>> Hi, >>>> >>>> I can reproduce the error on my cluster. >>>> >>>> Unfortunately I can't check whether the parent directories were created >>>> on the different nodes since I have no way of accessing them. I start all >>>> the jobs from a gateway. >>>> >>> >>> I've added a check to the directory creation (in branches release-0.8 >>> and master), which should fail with a proper error message if that is the >>> problem. If you have time to (re)deploy Flink, it would be great to know if >>> that indeed is the issue. Otherwise, we need to further investigate this. >>> >>> >>> >> >