Thank you very much, Robert!

The problem is that the job/task manager shutdown methods are never called.
When using the scripts, the task/job manager processes get killed and
therefore shutdown methods are never called.

@Till: Do you know whether there is a mechanism in Akka to register the
actors for JVM shutdown hooks? I tried to register a shutdown hook via
Runtime.getRuntime().addShutdownHook(), but I didn't manage to get a
reference to the task manager.


On Thu, Feb 5, 2015 at 3:29 PM, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Robert,
>
> thanks for the info. If the TaskManager/JobManager does not shutdown
> properly, i.e. killing of the process, then it is indeed the case that the
> BlobManager cannot properly remove all stored files. I don't know if this
> was lately the case for you. Furthermore, the files are not directly
> deleted after the job has finished. Internally there is a cleanup task
> which is triggered every our and deletes all blobs which are no longer
> referenced.
>
> But we definitely have to look into it to see how we could improve this
> behaviour.
>
> Greets,
>
> Till
>
> On Thu, Feb 5, 2015 at 3:21 PM, Robert Waury <robert.wa...@googlemail.com>
> wrote:
>
>> I talked with the admins. The problem seemed to have been that the disk
>> was full and Flink couldn't create the directory.
>>
>> Maybe the the error message should reflect if that is the cause.
>>
>> While cleaning up the disk we noticed that a lot of temporary blobStore
>> files were not deleted by Flink after the job finished. This seemed to have
>> caused or at least worsened the problem.
>>
>> Cheers,
>> Robert
>>
>> On Thu, Feb 5, 2015 at 1:14 PM, Ufuk Celebi <u...@apache.org> wrote:
>>
>>> On Thu, Feb 5, 2015 at 11:23 AM, Robert Waury <
>>> robert.wa...@googlemail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I can reproduce the error on my cluster.
>>>>
>>>> Unfortunately I can't check whether the parent directories were created
>>>> on the different nodes since I have no way of accessing them. I start all
>>>> the jobs from a gateway.
>>>>
>>>
>>> I've added a check to the directory creation (in branches release-0.8
>>> and master), which should fail with a proper error message if that is the
>>> problem. If you have time to (re)deploy Flink, it would be great to know if
>>> that indeed is the issue. Otherwise, we need to further investigate this.
>>>
>>>
>>>
>>
>

Reply via email to